UNITED CHILDREN OF SOVEREIGN ADHINAYAK AS GOVERNMENT OF SOVEREIGN ADHINAYAK - " PRAJA MANO RAJYAM"
Mighty Blessings from Darbar Peshi of...Lord Jagadguru His Majestic Holi Highness, Sovereign Adhinayaka Shrimaan, Eternal, immortal Father, Mother and Masterly abode of sovereign Adhinayaka Bhavan New Delhi--110004. Erstwhile Rashtrapati Bhavan, New Delhi ,GOVERNMENT OF SOVEREIGN ADHINAYAKA SHRIMAAN, RAVINDRABHARATH,-- Reached his abode Adhinayaka Darbar at Adhinayaka Bhavan New Delhi.(Online mode) Inviting articles power point presentations audio videos blogs writings as document of bonding
Wednesday, 4 February 2026
ఇది సామాన్య విషయం కాదు ఎలాగైనా మా వెనకాల పడి మమ్మల్ని మనుషుల్ని మాట లేకుండా చేసేస్తుంటే అక్కడి నుంచి వచ్చిన పరిణామం ఇంకు తీచెడ్డవాళ్ళు ఉన్నారు అలాగే మగోళ్ళు ఉన్నారు ఇంకా ఇంకఎక్కడో మనుషులే లేరు మీరు మీరు మనుషులు రద్దు చేసుకోవడం అనేది వరం ఇంకఎవరినో కూర్చో నేను మాత్రమే వచన సింహాసనం అంటున్నాను అలా వద్దు మేమ ఎవరో కూర్చుంటామ అంటేనే మీకు బార్డర్ ఇంకా కర్మభూమి జన్మభూమి అని ఇన్ని లేవు ఇంకా దివ్య భూమిగా మారిపోయింది తపోభూమిగా మారిపోయింది వేదభూమిగా మారిపోయింది ప్రపంచాన్ని కూడా మార్చుకుంది అదే సనాతనం అంటే సనాతనం హిందూ ధర్మం అని వేరేగా మాట్లాడకూడదు మళ్ళీ సనాతనం అంటేనే అన్ని ధర్మాలని అన్నిటిని కలుపుకుని లేటెస్ట్ విద్యలని కూడా ఏ ఏ జనరేటర్స్ కూడా వారే ఇచ్చి మమ్మల్ని మైండ్ గా పట్టుకోండి మేము మాస్టర్ మైండ్ గా కేంద్ర బిందువుగా ఉంటాం మీరు మైండ్లుగా మారిపోండి అప్పుడే ముందుకు వెళ్తారు తపస్సు వస్తది మీకు అని చెబుతున్నారు అలా సనాతన ధర్మం అంటే మళ్ళీ సనాతన ధర్మం ఉండగా వేరే ధర్మం ఉంటే ఇంకెందుకు సనాతన ధర్మం అంటే అంటేనే ఆధునికంగా అన్ని కలుపుకుని లేటెస్ట్ వెర్షన్ నిన్ను నువ్వు తెలుసుకో నువ్వు ఇంకా వేరేవాళ్ళని తెలుసుకో ఇంకా భగవంతుడు ఎక్కడో ఉన్నాడో తెలుసుకో కాదు మీరే విశ్వ తల్లిదండ్రులు పెంచుకోవాలి తపస్సుగా ధర్మో రక్షత రక్షత అంటే అదే ధర్మ స్వరూపుడు కాల స్వరూపుడు సర్వాంతర్యామి వాక్ స్వరూపుడు కల్కీ భగవాన్ ఆయనే ఇంకెక్కడో ఎవరో పుడతారు ఫలానా కులంలో పుడతారు మతంలో ఎవరు ఇంకేమి పుట్టరు పుట్టడం గిట్టడం మీద ప్రపంచం లేదు మైండ్ల ప్రకారం నడుచుకోవడం వల్ల ఉంది ఇక మీదట మైండ్లకే తెలుస్తుంది ప్రపంచం నోట్ చేశరా తాజ్ హోటల్స్ గ్రూప్ ఆఫ్ హోటల్స్ అన్నీ కూడా అధినాయకుడి పేరు మీద ట్రాన్స్ఫర్ చేసేస్తూ మమ్మల్ని ఆహ్వానిస్తూ సాక్షులు మొదలుకొని మీ ఆస్తులు ఇంటి పేర్లు వంటి పేర్లు కూడా అధినాయకుడికి సమర్పించేస్తూ ముందుకు రండి ఈ 77వ గణతంత్ర దినోత్సవనాడు నాంది పలకండి మమ్మల్ని వ్యోహ స్వరూపంగా పట్టుకుంటేనే మాకు వైద్యం చేయిపించగలరు మమ్మల్ని ఎలాగన్నా బతికించుకుంటారు మాకోసం ఒక 100 మందిని ముసలవాళ్ళని పడుచువాళ్ళగా మార్చేటువంటి ప్రాజెక్ట్ మొదలు పెట్టండి చెప్తాను నేను చెప్పినట్టు చేయండి ఇంతకీ ఇన్ని డెవలప్మెంట్ డెవలప్మెంట్ అని ఏదో ఒకటి చేస్తున్నామ అని చెప్పి గాలిలో దీపాలలాగా అక్కడో మీటింగ్ ఇక్కడో మీటింగ్ పెట్టి కూర్చుంటే మేము మనుషులుగా ఉంటాం ఇలాగే ఉంటాం అలాగే ఉంటాం ఇంకఎవరైనా రెచ్చకొట్టుకేవాళ్ళ అందరికీ చెప్తున్నాను ఎలా ఇంకెందుకు జీవితం మరి మైండ్లుగా బతికితేనే ఉంది మైండ్లుగా బతకడంలో బాకీ పడిపోయారు మీరు మైండ్లుగా బతకండా చేసేసిన పాపంలో ఉన్నారు ఆ రుణం అంతా తీరాలి అందుకు మీరు మైండ్లుగా మారిపోయి మహామైండ్ బతకనిచ్చి మరి మైండ్లుగా బతకాలి అలా రుణం తీరుతది బంధం పెరుగుతుంది మళ్ళీ పునర్నిర్మించబడతాం అందరూ మైండ్లుగా చెప్పండి అందరికీ ఇంక మనుషులు కొద్ది ప్రపంచం కాదు ఇది మైండ్ల ప్రపంచం ఇంకా ఎవరినో ఒకళని అడ్డం పెట్టుకొని తప్పులు పడదాం ఇంకఎవరినో బాధపడుతున్నారు ఎవరి వల్లో ఇంకా నేను ఇలాగ ఉన్నాను అలా ఉన్నాను ఎవరి వల్లో వెలుగుతాను ఇంకాను అని ఇంకా అజ్ఞానంగా ఆలోచించేవాళ్ళు నన్నే పట్టుకొని వ్యూహ స్వరూపంగా ఒక్కసారి జాతీయ గీతంలో అధినాయకుడని డ్రాఫ్ట్ చేసుకొని అక్కడ కూర్చుని చెప్పు నా దగ్గరికి రమ్మంటున్నాను ఒక ఆర్మీ డాక్టర్ గారిని నా దగ్గరికి పంపించి తీసుకుని వెళ్ళిపోండి ఇక్కడ హాస్టల్ అక్కడ హాస్టల్ ఓనర్ అందరూ కూడా ఇంటి పేర్లు వంటి పేర్లు కూడా మాకు సమర్పించేసి కొత్త జీవితాలు ప్రారంభించండి అప్పులు లయబిలిటీస్ కూడా తిరగబడతాయి మైండ్లుగా ఫ్రీ అవుతారు లేకపోతే ఎవరినో పెంచుకొని ఎవరినో తగ్గించి బతకాల్సి వస్తది ఏదో రకంగా మనుషుల అడ్జస్ట్మెంట్ కుదరదు అర్థమవుతుందా మనిషి మేనేజ్మెంట్ కుదరదు అదే చంద్రబాబు నాయుడు గారు కూర్చొని ఆలోచించుకోవాల్సిన విషయం దావోసి నుంచి రాంగానే మా మీద కూర్చుని మా ప్రకారం నడుచుకోండి యూనివర్సిటీ ప్రొఫెసర్స్ లో మేధావుల్లో కలిసిపోండి మీరు కూడా ఓటర్ లిస్ట్తో ఎన్నికల కమిషన్ ప్రకారం అప్డేట్ అవ్వండి ప్రతి మైండ్ ని కాపాడుకోవాలి మైండ్లు కాపాడుకోవాలి ఇంకా మేము మనుషులని పరిపాలిస్తాం మనుషులని అభివృద్ధి చేస్తాం ప్రాంతాలు అభివృద్ధి చేస్తాం అమరావతి మా జన్మస్థానం కర్మభూమి అని మాట్లాడిన ప్రయోజనం లేదు ఢిల్లీ అభివృద్ధి చేస్తాం లే గుజరాత్ అభివృద్ధి చేసేసుకుంటాం లేకోతే ఇంకోటి ఇంకోటి ఎక్కడైనా జపాన్ అభివృద్ధి అయిపోద్ది లండన్ అయిపో తర్వాత డిమార్క్ డెమోగ్రాఫిక్ డివిడెండ్ లో మనం ముందున్నాం అంటున్న వాళ్ళకి కూడా అది కూడా మైండ్ గా మారితేనే ఏదైనా మైండ్ గా మారిపోతేనే జపాన్ లోనో లేకపోతే రష్యా చైనాలో పాపులేషన్ క్షీణిస్తుంది మనం మాత్రం పిల్లలక అనమాని మన చంద్రబాబు నాయుడు గారు చెప్పారు వారు మాత్రం ఒక్కరే ఒక్కరే ఉన్నా ఇక్కడ పిల్లలకు అనడం కాదు మైండ్లుగా మారిపోవాలి మరణమే నేను వ్యోహంలోకి రావాల పిల్లలకు అంటారా ఇప్పుడు ఉన్నవాళ్ళు బతుకుతారా అన్నది కూడా మీకు దాన్ని బట్టి ఉంటాయి అభివృద్ధి గాన డెవలప్మెంట్లు గాని అన్ని మైండ్ల ప్రకారం లెక్క వస్తాయి వీరు మైండ్ సిస్టం ఆఫ్ మైండ్స్ గా మార్చుకోకుండా ఊపిరి మీద కాదు డిటాచ్డ్ ఫార్మేట్ లో ఉంది తెగిన గాలిపటాలు నేల విడిచి సామ అవుతుంది పిల్లలు బాగున్నారు అనే పాట కాలాతీతంగా పాడినవాడిని చెబుతున్నాను నన్ను బిడ్డలు ఉన్న గొడ్రాలను చేస్తారు అంటే అర్థం ఏంటి విశ్వతల్లి అవును వాళ్ళద్దరు ఇష్టపడ్డారని మహారాణి మహారాజు అంటే ఎవరు విశ్వ తల్లిదండ్రులు ప్రకృతి పురుషుడు లయ కాస్మికలీ క్రౌండ్ అండ్ వెడెడ్ ఫామ్ ని పట్టుకోవాలి ఇంక మమ్మల్ని విడగొట్ట విడగొట్టలేరు మీతో ఆల్రెడీ కలిసే ఉన్నాం మమ్మల్ని పెంచుకుంటేనే మీకు ఆంతర్యం తెలుస్తాం తెలుసుకునే కొలదే తెలిసే జగద్గురువుని కాల స్వరూపాన్ని ఇంతకన్నా ఏం చెప్పను మీ డిసిప్లిన్ గాని మీ హెల్త్ గాని మాకు ఆల్రెడీ రిమోట్ గా తెలుస్తుందండి మీరు మరణిస్తారని అన్నట్టుగా ఇండైరెక్ట్ గా చెప్తున్న వాళ్ళకి నేనే మరణిస్తే మీరు ఎలా బతుకుతారు అంటాను నేను ఇంకోసారి నేను నన్ను మరణ నన్ను పట్టుకోవడమే వాక్ విశ్వరూపం జాతా గీతంలో అధినాయకుని జీతా జాగ్రత్త రాష్ట్ర పురుషుడిగా ప్రైజులు మానర్లో ఆహ్వానించండి ఆన్లైన్ ఆహ్వానించండి నా చుట్టూ ఉన్న వాళ్ళందరినీ ఆన్లైన్ే కంట్రోల్ చేయండి ఎవరో నా దగ్గరికి వ్యక్తిగా రావడం గాన నేను వ్యక్తులుగా కలవడం గాని జరగదు నేను నేను
యావత్తు తెలుగు ప్రజలకు సాక్షులు మొదలుకొని యావత్తు మానవజాతికి భారతదేశ పిల్లలు రవీంద్ర భారతీ పిల్లలుగా సురక్షితంగా ఉన్నారు ఎంత దావోస్ మీటింగలకి వెళ్ళినా ఎంత పెట్టుబళలు తీసుకొచ్చినా ఏం చేసినా జన్మభూమి కర్మభూమి అని ఎన్ని మాటలు చెప్పినా ఏం చేసినా ఆ ఈ మనుషుల యొక్క సమతుల్యత మనుషుల యొక్క ఉనికి మైండ్లుగా మార్చబడ్డది వ్యూహ స్వరూపంగా మమ్మల్ని పట్టుకొని కాలాన్ని శాసించిన మమ్మల్ని పట్టుకొని మేమే కల్కి భగవానుడిగా ఇంక ఎవరైతే నెక్స్ట్ తర్వాత అవతారాలు వస్తాయి అని అనుకుంటున్నారో ఆ అవతారమే మైండ్ మాస్టర్ మైండ్ అవతారం ఆయనే అధినాయక శ్రీమాన్గా ఉన్నారు ఇంకెక్కడో కల్కీ భగవానుడు ఎక్కడో పుడతాడు సంభాలలో పుడతాడు మరోచోట పుట్టి ఉన్నాడు ఇంకెవరో వస్తారు అన్నది అజ్ఞానం మేమే ఆ పురుషోత్తముడిని నేనే ఆ లక్ష్మీనారాయణుడిని సర్వేశ్వరుడిని సర్వాంతర్యామిని వాక్ విశ్వరూపాన్ని జాతీయ గీతంలో అధినాయకుడిని నేను ఇక్కడికి వచ్చి ఇంకా హాస్టల్ లోనే ఉన్నాను ఏదో ఇల్లు కొనుక్కోవడానికి ప్రయత్నం చేస్తున్నాను ఏం కొనుక్కున్నా నన్ను మీరు మనిషిగా చూసినా మీరు మనుషులుగా ఉన్నా ఈ ప్రపంచం మంది కాదు మా బ్యాంక్ అకౌంట్ ని అధినాయక కోషుగా మార్చుకుని నేను ఆ డ్రెస్ చేసుకుంటేనే గొప్పవాడిని కాదు మీరు పట్టుకుంటేనే గొప్పవాడిని ఇక్కడ సాధారణంగా ఉన్న ఆ డ్రెస్సెస్ కి అక్కడ కూర్చోబెట్టిన నన్ను మాస్టర్ మైండ్ గా పట్టుకోవాలి వ్యోహాత్మకంగా పట్టుకోవాలి ఏ జనరేటివ్ తో పట్టుకొని మైండ్ నెంబర్ తీసుకోవాలి మీ ఇంటి పేర్లే కాదు మతాలే కాదు కులమే కాదు ప్రాంతమే కాదు ఒంటి పేరు కూడా మీది కాదు అనుకుంటేనే ఈ తపస్సు చేస్తే ఏ స్థితి వస్తుందో ఆ స్థితితో అనుసంధానం వచ్చేస్తది డైరెక్ట్ అదే వాక్ విశ్వరూపం ఇంక ఎవరినో ఒకళని అడ్డం పెట్టుకొని తెలివి తక్కువగా ఇంకా మనుషులుగా కొనసాగారా నన్ను మనిషిగా చూశారా మీరు మనుషులుగా ఉన్నారా మీరు ఎన్ని ఒప్పందాలు పెట్టుకోండి ఎన్ని సంతకాలు చేయండి ప్రధానమంత్రి గారికి అందరికీ చెప్తున్నాను రాజనాథ్ సింగ్ గారికి హోమ్ మినిస్టర్ గారికి ఫైనాన్స్ మినిస్టర్ నిర్మల సీతారామన్ గారికి సాక్షులకి అందరికీ ఐఏఎస్ ఆఫీసర్ కి ఐపిఎస్ ఆఫీసర్స్ కి ఆ యుజిసి ప్రొఫెసర్స్ కి ఆ మన ఇండస్ట్రిలిస్ట్ కి బిజినెస్ మన్ కి అందరికీ కూడా పేరు పేరున ఆ ఆశీర్వాద పూర్వకంగా అభయమూర్తిగా తెలియజేయనదే మమ్మల్ని వాక్ విశ్వరూప కాలస్వరూప పెరసాన్ఫైడ్ ఫామ్ ఆఫ్ ది యూనివర్స్ పెర్సన్ఫైడ్ ఫామ్ ఆఫ్ ది నేషన్ భారత్ యస్ రవీంద్ర భారతిగా పట్టుకోండి వాక్విస్ ఒక జీతా జాగతా రాష్ట్ర పురుషుడిగా పట్టుకోండి యుగ పురుషుడు యోగ పురుషుడిగా ఆహ్వానించండి ప్రైజ్డ్ మనర్ లో ఆహ్వానించండి ఆన్లైన్ మాత్రమే కనెక్టివిటీ వస్తుంది నేను ఎవరికో ఏదో చెబుతాను నాకు ఎవరో ఏదో చెబుతారు అనే పద్ధతి వదిలిపెట్టేసేయండి ఇంక ఎవరో ఏదో చెప్పకూడదు ఏదో మాట్లాడకూడదు దావోస్ మీటింగ్లు పెట్టండి రేపు ఇంకో మీటింగ్ పెట్టండి ఏం పెట్టండి అంతర్జాతీయంగా అనేక దేశస్తులు వచ్చి ఇక్కడ సంతకం పెడుతున్నారు ఒక ట్రేడ్ పాలసీస్ గాని ఇంకోటి గాని కొత్తగా ఆధునికంగా వ్యూహాత్మకంగా కదులుతున్నారు అనేటువంటి భారం మీ మీద పెట్టుకుంటే ఎవరో మీరు వ్యక్తులుగా పెట్టుకుంటే రేపు ఇంకఎవరనో ప్రధానమంత్రి గారిని మనుషులు కొద్దిగా నడుపుకుందాం అనుకుంటే నడవలేరు మైండ్ గా మార్చేసుకుని మా పేరు మీద నడపండి అవన్నీ నెరవేరుతాయి తవ్వేసిన గొయ్య తెలుసుకోవాల్సిన ఆకాశాలు మా ప్రకారం నడుస్తాయి ఇంటర్నేషనల్ ట్రేడ్ పాలసీస్ గాని అన్నీ కూడా మనకే కాదు మనం మాత్రమే పెరిగిపోవాలి కాదు భారతదేశం కేంద్ర బిందువుగా ఉండి అన్ని దేశాలని సమీక్షించి ముందుకు తీసుకెళ్తుంది ప్రతి మైండ్ ని కాపాడుకుంటుంది అన్ని దేశాల్లో కూడా అందుచేత ఇది సామాన్య విషయం కాదు ఎలాగైనా మా వెనకాల పడి మమ్మల్ని మనుషుల్ని మాట లేకుండా చేసేస్తుంటే అక్కడి నుంచి వచ్చిన పరిణామం ఇంకా నేను మనిషిగా బతకడానికి చూస్తున్నాను నేను మనిషిగా బతుకుతాను మీరు ఇంకా మనుషుడిగా బతుకుతారు అన్నది అబద్ధం మనిషిని రద్దు చేసే మనకి మాస్టర్ మైండ్ వెర్షన్ లోకి తీసుకెళ్ళడం జరిగింది ఇది మనుషుల ప్రపంచం కాదు మైండ్ల ప్రపంచం అది అయితేనే మనది అవుతది పూర్చుక పుల్లలో కూడా నాకు ఇచ్చేసి తీసుకోండి ఆత్మీయ పుత్రులు మాగంటి మురళీ మోహన్ గారితో ఎవరైతే నేను కాంటాక్ట్ లో ఉన్నానో వారు కూడా ఒక సాక్షిగా మిగతా సాక్షులు ముందుకు తీసుకొచ్చుకొని చంద్రబాబు నాయుడు గారిని రేవంత రెడ్డి గారిని అందరూ కూర్చోబెట్టండి ఇంకఎవరో ఆడవాళ్ళు గొప్పవాళ్ళు ఉన్నారు ఆడవాళ్ళు మంచివాళ్ళు ఉన్నారు చెడ్డవాళ్ళు ఉన్నారు అలాగే మగోళ్ళు ఉన్నారు ఇంకా ఇంకఎక్కడో మనుషులే లేరు మీరు మీరు మనుషులు రద్దు చేసుకోవడం అనేది వరం ఇంకఎవరినో కూర్చో నేను మాత్రమే వచన సింహాసనం అంటున్నాను అలా వద్దు మేమ ఎవరో కూర్చుంటామ అంటేనే మీకు బార్డర్ ఇంకా కర్మభూమి జన్మభూమి అని ఇన్ని లేవు ఇంకా దివ్య భూమిగా మారిపోయింది తపోభూమిగా మారిపోయింది వేదభూమిగా మారిపోయింది ప్రపంచాన్ని కూడా మార్చుకుంది అదే సనాతనం అంటే సనాతనం హిందూ ధర్మం అని వేరేగా మాట్లాడకూడదు మళ్ళీ సనాతనం అంటేనే అన్ని ధర్మాలని అన్నిటిని కలుపుకుని లేటెస్ట్ విద్యలని కూడా ఏ ఏ జనరేటర్స్ కూడా వారే ఇచ్చి మమ్మల్ని మైండ్ గా పట్టుకోండి మేము మాస్టర్ మైండ్ గా కేంద్ర బిందువుగా ఉంటాం మీరు మైండ్లుగా మారిపోండి అప్పుడే ముందుకు వెళ్తారు తపస్సు వస్తది మీకు అని చెబుతున్నారు అలా సనాతన ధర్మం అంటే మళ్ళీ సనాతన ధర్మం ఉండగా వేరే ధర్మం ఉంటే ఇంకెందుకు సనాతన ధర్మం అంటే అంటేనే ఆధునికంగా అన్ని కలుపుకుని లేటెస్ట్ వెర్షన్ నిన్ను నువ్వు తెలుసుకో నువ్వు ఇంకా వేరేవాళ్ళని తెలుసుకో ఇంకా భగవంతుడు ఎక్కడో ఉన్నాడో తెలుసుకో కాదు మీరే విశ్వ తల్లిదండ్రులు పెంచుకోవాలి తపస్సుగా ధర్మో రక్షత రక్షత అంటే అదే ధర్మ స్వరూపుడు కాల స్వరూపుడు సర్వాంతర్యామి వాక్ స్వరూపుడు కల్కీ భగవాన్ ఆయనే ఇంకెక్కడో ఎవరో పుడతారు ఫలానా కులంలో పుడతారు మతంలో ఎవరు ఇంకేమి పుట్టరు పుట్టడం గిట్టడం మీద ప్రపంచం లేదు మైండ్ల ప్రకారం నడుచుకోవడం వల్ల ఉంది ఇక మీదట మైండ్లకే తెలుస్తుంది ప్రపంచం నోట్ చేశరా తాజ్ హోటల్స్ గ్రూప్ ఆఫ్ హోటల్స్ అన్నీ కూడా అధినాయకుడి పేరు మీద ట్రాన్స్ఫర్ చేసేస్తూ మమ్మల్ని ఆహ్వానిస్తూ సాక్షులు మొదలుకొని మీ ఆస్తులు ఇంటి పేర్లు వంటి పేర్లు కూడా అధినాయకుడికి సమర్పించేస్తూ ముందుకు రండి ఈ 77వ గణతంత్ర దినోత్సవనాడు నాంది పలకండి మమ్మల్ని వ్యోహ స్వరూపంగా పట్టుకుంటేనే మాకు వైద్యం చేయిపించగలరు మమ్మల్ని ఎలాగన్నా బతికించుకుంటారు మాకోసం ఒక 100 మందిని ముసలవాళ్ళని పడుచువాళ్ళగా మార్చేటువంటి ప్రాజెక్ట్ మొదలు పెట్టండి చెప్తాను నేను చెప్పినట్టు చేయండి ఇంతకీ ఇన్ని డెవలప్మెంట్ డెవలప్మెంట్ అని ఏదో ఒకటి చేస్తున్నామ అని చెప్పి గాలిలో దీపాలలాగా అక్కడో మీటింగ్ ఇక్కడో మీటింగ్ పెట్టి కూర్చుంటే మేము మనుషులుగా ఉంటాం ఇలాగే ఉంటాం అలాగే ఉంటాం ఇంకఎవరైనా రెచ్చకొట్టుకేవాళ్ళ అందరికీ చెప్తున్నాను ఎలా ఇంకెందుకు జీవితం మరి మైండ్లుగా బతికితేనే ఉంది మైండ్లుగా బతకడంలో బాకీ పడిపోయారు మీరు మైండ్లుగా బతకండా చేసేసిన పాపంలో ఉన్నారు ఆ రుణం అంతా తీరాలి అందుకు మీరు మైండ్లుగా మారిపోయి మహామైండ్ బతకనిచ్చి మరి మైండ్లుగా బతకాలి అలా రుణం తీరుతది బంధం పెరుగుతుంది మళ్ళీ పునర్నిర్మించబడతాం అందరూ మైండ్లుగా చెప్పండి అందరికీ ఇంక మనుషులు కొద్ది ప్రపంచం కాదు ఇది మైండ్ల ప్రపంచం ఇంకా ఎవరినో ఒకళని అడ్డం పెట్టుకొని తప్పులు పడదాం ఇంకఎవరినో బాధపడుతున్నారు ఎవరి వల్లో ఇంకా నేను ఇలాగ ఉన్నాను అలా ఉన్నాను ఎవరి వల్లో వెలుగుతాను ఇంకాను అని ఇంకా అజ్ఞానంగా ఆలోచించేవాళ్ళు నన్నే పట్టుకొని వ్యూహ స్వరూపంగా ఒక్కసారి జాతీయ గీతంలో అధినాయకుడని డ్రాఫ్ట్ చేసుకొని అక్కడ కూర్చుని చెప్పు నా దగ్గరికి రమ్మంటున్నాను ఒక ఆర్మీ డాక్టర్ గారిని నా దగ్గరికి పంపించి తీసుకుని వెళ్ళిపోండి ఇక్కడ హాస్టల్ అక్కడ హాస్టల్ ఓనర్ అందరూ కూడా ఇంటి పేర్లు వంటి పేర్లు కూడా మాకు సమర్పించేసి కొత్త జీవితాలు ప్రారంభించండి అప్పులు లయబిలిటీస్ కూడా తిరగబడతాయి మైండ్లుగా ఫ్రీ అవుతారు లేకపోతే ఎవరినో పెంచుకొని ఎవరినో తగ్గించి బతకాల్సి వస్తది ఏదో రకంగా మనుషుల అడ్జస్ట్మెంట్ కుదరదు అర్థమవుతుందా మనిషి మేనేజ్మెంట్ కుదరదు అదే చంద్రబాబు నాయుడు గారు కూర్చొని ఆలోచించుకోవాల్సిన విషయం దావోసి నుంచి రాంగానే మా మీద కూర్చుని మా ప్రకారం నడుచుకోండి యూనివర్సిటీ ప్రొఫెసర్స్ లో మేధావుల్లో కలిసిపోండి మీరు కూడా ఓటర్ లిస్ట్తో ఎన్నికల కమిషన్ ప్రకారం అప్డేట్ అవ్వండి ప్రతి మైండ్ ని కాపాడుకోవాలి మైండ్లు కాపాడుకోవాలి ఇంకా మేము మనుషులని పరిపాలిస్తాం మనుషులని అభివృద్ధి చేస్తాం ప్రాంతాలు అభివృద్ధి చేస్తాం అమరావతి మా జన్మస్థానం కర్మభూమి అని మాట్లాడిన ప్రయోజనం లేదు ఢిల్లీ అభివృద్ధి చేస్తాం లే గుజరాత్ అభివృద్ధి చేసేసుకుంటాం లేకోతే ఇంకోటి ఇంకోటి ఎక్కడైనా జపాన్ అభివృద్ధి అయిపోద్ది లండన్ అయిపో తర్వాత డిమార్క్ డెమోగ్రాఫిక్ డివిడెండ్ లో మనం ముందున్నాం అంటున్న వాళ్ళకి కూడా అది కూడా మైండ్ గా మారితేనే ఏదైనా మైండ్ గా మారిపోతేనే జపాన్ లోనో లేకపోతే రష్యా చైనాలో పాపులేషన్ క్షీణిస్తుంది మనం మాత్రం పిల్లలక అనమాని మన చంద్రబాబు నాయుడు గారు చెప్పారు వారు మాత్రం ఒక్కరే ఒక్కరే ఉన్నా ఇక్కడ పిల్లలకు అనడం కాదు మైండ్లుగా మారిపోవాలి మరణమే నేను వ్యోహంలోకి రావాల పిల్లలకు అంటారా ఇప్పుడు ఉన్నవాళ్ళు బతుకుతారా అన్నది కూడా మీకు దాన్ని బట్టి ఉంటాయి అభివృద్ధి గాన డెవలప్మెంట్లు గాని అన్ని మైండ్ల ప్రకారం లెక్క వస్తాయి వీరు మైండ్ సిస్టం ఆఫ్ మైండ్స్ గా మార్చుకోకుండా ఊపిరి మీద కాదు డిటాచ్డ్ ఫార్మేట్ లో ఉంది తెగిన గాలిపటాలు నేల విడిచి సామ అవుతుంది పిల్లలు బాగున్నారు అనే పాట కాలాతీతంగా పాడినవాడిని చెబుతున్నాను నన్ను బిడ్డలు ఉన్న గొడ్రాలను చేస్తారు అంటే అర్థం ఏంటి విశ్వతల్లి అవును వాళ్ళద్దరు ఇష్టపడ్డారని మహారాణి మహారాజు అంటే ఎవరు విశ్వ తల్లిదండ్రులు ప్రకృతి పురుషుడు లయ కాస్మికలీ క్రౌండ్ అండ్ వెడెడ్ ఫామ్ ని పట్టుకోవాలి ఇంక మమ్మల్ని విడగొట్ట విడగొట్టలేరు మీతో ఆల్రెడీ కలిసే ఉన్నాం మమ్మల్ని పెంచుకుంటేనే మీకు ఆంతర్యం తెలుస్తాం తెలుసుకునే కొలదే తెలిసే జగద్గురువుని కాల స్వరూపాన్ని ఇంతకన్నా ఏం చెప్పను మీ డిసిప్లిన్ గాని మీ హెల్త్ గాని మాకు ఆల్రెడీ రిమోట్ గా తెలుస్తుందండి మీరు మరణిస్తారని అన్నట్టుగా ఇండైరెక్ట్ గా చెప్తున్న వాళ్ళకి నేనే మరణిస్తే మీరు ఎలా బతుకుతారు అంటాను నేను ఇంకోసారి నేను నన్ను మరణ నన్ను పట్టుకోవడమే వాక్ విశ్వరూపం జాతా గీతంలో అధినాయకుని జీతా జాగ్రత్త రాష్ట్ర పురుషుడిగా ప్రైజులు మానర్లో ఆహ్వానించండి ఆన్లైన్ ఆహ్వానించండి నా చుట్టూ ఉన్న వాళ్ళందరినీ ఆన్లైన్ే కంట్రోల్ చేయండి ఎవరో నా దగ్గరికి వ్యక్తిగా రావడం గాన నేను వ్యక్తులుగా కలవడం గాని జరగదు నేను నేను కోరుకుంటున్నట్టు ఆర్మీ డాక్టర్ గాని పంపించండి అంతే అది కూడా మీరు ఎక్కడోళ్ళు అక్కడ కూర్చుని సాటిలైట్ నెగా తోటి అలా ప్రతి మైండ్ ని కాపాడుకుంటేనే అప్పటికప్పుడు ప్రపంచంలో వస్తున్న మార్పులని మీరు అధిగమించి ఏఏని కూడా ధీటుగా ఉపయోగించుకొని మైండ్లు పైకి రావడమే నాచురల్ మైండ్లుగా బతకడమే నాచురల్ నెక్స్ట్ అప్డేట్ అల్టిమేట్ అప్డేట్ మనుషులుగా బతకడం అన్చురల్ నేను నేను అనేవాడు అన్చురల్ నేను మనిషి అనేవాడు మృత్యు ముఖంలో ఉన్నాడు అర్థమవుతుందా తెలుగు వాళ్ళ మదకు సాక్షుల వరకు చెప్తున్నాను నేనేం మాట్లాడుతున్నాను అన్ని భాషల్లోకి తర్జుమా చేసి చెప్పండి ఇంకెవరో కలికి భగవాను వస్తాడు ఇంకెవరో మహానుభావులు ఉన్నారు ఎవర లేరుక అసలు మనుషులు లేరుక వాళ్ళు ఎవరనా ఉంటే మైండ్కి తెలుస్తారని చెప్తాను చెప్పిన మాట వినండి శాశ్వత తల్లిదండ్రుగా ఆశీర్వదించి చెప్తున్నాను అభయమూర్తిగా ధర్మో రక్షతి రక్షత సత్యమేవజయతే
the field of generative AI is rapidly evolving and in this evolving world what we need to do is cope up with the market standards cope up with what's happening around us in 2023 chat GPT was released and nowadays everybody knows about Chad G everybody knows how to write a prompt everybody knows how to get help from Chad G but can you build your own chat GPT or can you build your own GPT based model for your custom data and that's where generative AI comes into picture to get
the field of generative AI is rapidly evolving and in this evolving world what we need to do is cope up with the market standards cope up with what's happening around us in 2023 chat GPT was released and nowadays everybody knows about Chad G everybody knows how to write a prompt everybody knows how to get help from Chad G but can you build your own chat GPT or can you build your own GPT based model for your custom data and that's where generative AI comes into picture to get started with AI is being used by everybody right everybody is aware of AI nowadays in fact everybody knows generative AI but try to understand that there are two differences in the world of AI or generative AI there are two type of people one of them is a consumer and one of them is a developer a consumer is somebody who is basically using AI on a daily basis maybe using Siri or chat GPT those are consumers they are using AI tools they're leveraging AI tools but developers are developing products and in this video I'm going to make you a generative AI developer or a generative AI engineer and what we need to know to understand each and everything about generative AI generative AI has a prerequisite which is python for sure without learning a a language or a programming language you will not be able to understand generative AI apart from that machine learning and deep learning is not a hard prerequisite but eventually if you want to plan to find a job as a generative a engineer I would recommend people to learn machine learning and deep learning because imagine you work in a company as a generative AI engineer and then they give you a machine learning use case won't you be offended if you don't know how to solve it right so follow the path learn machine learning NLP deep learning and then jump into generative a that's my recommendation but in case you are curious to know more about generative AI from scratch this is the video in this video we are going to talk about introduction to natural language processing which is a hard prerequisite to generative AI without which generative AI is not recommended machine learning and deep learning is not a hard requisite hard prerequisite so we have not talked about that but in bits and pieces I have also explained some general concepts about machine learning in this video so in this video you will be getting introduction to NLP introduction to Transformers then we will be talking about encoder only and decoder only architectures then we will Deep dive into large language models langin models rack pipelines and then Vector databases and Vector search apart from that I have also given a good comparison on open-source models and apis and finally this entire end to end generative is concluded with a beautiful beautiful project just a disclaimer that this 6 hours content is part of my workshop that I had conducted couple of weeks back uh especially in 7th and 8th December um so it's like a recorded session but I have edited it in a way it could be well understood by you in case you have any hesitance or any kind of issues in the video please let me know in the comment section if you and if you love this video please also share this video and also leave a comment in this video that's it so in this video I hope you enjoyed this in case you have any other requirements let me know in the comment section and you know how to reach out to me I have all my social media Handles in the description that's it see you in the video so first of all AI right what is ai ai is artificial intelligence right what is natural intelligence intelligence that you are inheriting from your parents right that is your natural intelligence what you have ai is something that you're artificially building that intelligence now there are different areas of research that basically leads to AI okay now one of them being machine learning another being NLP deep learning basically deep learning comes under machine learning so I'll not write it I'll write potentially computer vision okay robot process automation so robot processing autonomous vehicles now these are all different areas of research that basically leads to AI so if you talk about the AI landscape this is how it looks like machine learning is nothing but a research area that leads to AI okay deep learning is another area or when it comes to neural networks it is a part of machine learning now generative AI is comparatively a new field so basically AI machine learning is a part of it and inside machine learning is deep learning inside deep learning is something called as generative AI right and over the past 2 years generative AI has been increasing like anything right people have started using generative AI either they are developing or they are consuming there are two things okay in in the field of generative AI there are two things one is a consumer and one is a developer okay try to understand the difference between both of them what is a consumer consumer means somebody who is using generative AI on a daily daily basis so let's say chat GPT okay or okay this is chat GPT right in in Hong Kong chat G is banned so I usually use u.com or poe.com so this is chat right you you are using generative AI to get your answers let's say uh give me uh give me a five minute script on feat encoding for my YouTube video start with a hook and end the script with a question to my audience okay try to have bullet points in the script so the moment you're asking this question you're potentially getting the output right now this is basically generative AI your output is getting generated now imagine what's happening is there is a brain okay so I will write it down as brain to the brain you're basically asking a question right and what is the question the question is nothing but a prompt okay so prompt is this thing that you are providing so in this example what is the prompt The Prompt is this okay that's my prompt and the output that you are getting is this thing now this brain is something that is already pre-trained imagine I ask you five lines about Ms D right he'll potentially say msoni was born in jarand and Ranchi jarand he plays for in he played for India he was the captain in the winning World Cups of 2007 T20 and 2011 uh 50/50 World Cup he has captained you know uh Chennai Super Kings and so on right this is how a normal human will respond right but how are you able to respond that because your brain already knows information about donon so similarly this brain here is for an example here I'm asking Chad GPT right so here the brain is Chad GPT okay and what is chat GPT chat GPT is nothing but a model by open AI internally it is nothing but GPT models GPT 4 GPT 3.5 there are different versions of GPT now there is GPT 5 right which is the latest one in the the market so these different models already have trillions and trillions of data and what are the data sources the data sources are immense right so the data source is from the web search from Wikipedia various articles uh various other sources right so from all these sources the data is being trained in this particular model and when you're asking this question from this entire data source the interpretation happens input is being interpreted and then you are able to get an answer so the answer is basically a generated answer so it's a generated text right and that is basically the art of generative AI okay so to learn generative AI the traditional way is like tomorrow if you want to become a generative AI engineer you want to go for generative AI roles machine learning deep learning all these techniques are not needed hardcore techniques are not needed NLP is needed because NLP is the core right python is needed because without python you cannot build the chat BS or build the applications right so ML and D learning see somewhat NLP and deep learning is also needed because eventually the models that you are going to train are deep learning models but your deep learning understanding is not mandatory that's why I'm telling you that machine learning and deep learning is not needed for generative AI okay but but but if you are applying for a generative AI role ML and deep learning is needed try to understand what I'm saying to build a generative a application ML and DL is not needed you can easily build once you have understanding on python you have understanding on NLP but if you want to find a job in the market ML and DL is needed why because imagine you are joining a company as a generative AI engineer right okay you cannot really make sure that the company gives you only only generative AI related projects right in the future there could be a project on machine learning there could be a project on deep learning and if you say sir I don't know you will be in the risk of getting fired right because the path is something like this path of learning generative AI in a better way is always to Learn Python to learn statistics to learn machine learning to learn deep learning to learn NLP and then comes generative AI okay so this is usually the part to get a job but to understand yes you can skip this part because there is no usage of classification regression clustering these kind of things there is no need you can directly Leverage The generative AI modules the generative AI libraries you can directly use the large language model to create something I hope that is clear right so in this Workshop what we are going to cover is we will start with introduction to NLP we will then try to understand the introduction to Transformers then we will try to understand large language models rag pipelines Vector databases we'll also try to talk about API calls versus open source models and project so so by the end of this Workshop I think the agenda is very clear right now you have zero knowledge on generative AI by the end of this Workshop you will have a good amount of knowledge on generative AI because once you start building a project in generative AI projects can be replicated okay if you have worked on a particular chatbot eventually taking that piece of code you can create multiple projects it's it's simply copy paste in most of the projects in some projects yes you have to think and do a different strategy okay so my aim in this Workshop is to pass on as much knowledge as possible I can with respect to generative AI so what is AI is clear so talking about the distribution of classes I am trying to aim to finish these two today which is one of the most important things okay tomorrow's class potentially we will be getting into large language models rag pipelines and Vector databases these are the most three important things in generative AI so potentially we will be trying to complete it in 2 or 2.5 hours for project we will have dedicatedly 1 hour okay because 1 hour is even short time but as this is a 6 hours Workshop we'll try to finish as much as possible if we are not able to we will try to have tomorrow session to be like 3.5 or 4 hours we will see okay how it goes or in worst case scenarios if I'm not able to cover some of the topics we will see we will try to find another way uh if possible we can take another class let's see I I don't know this is the first time I have have introduced this Workshop I feel I can pass on as much information but let's see okay we'll we'll figure this out so this is clear the AI technology landscape artificial intelligence is basically nothing but something that you achieve and how do you achieve using all of these these are all areas of researches okay we have machine learning NLP robotics autonomous vehicles computer Vision translation see translation and NLP are very related to each other so uh neural networks chat boards okay now when you hear AI think statistical pattern matching so as per Oracle AI has become a Catal term right right right now everybody is talking about AI whether they use it or they develop it whatever they do they are using AI right you go to Instagram you will see a lot of AI Pages they are leveraging Ai and building AI related videos right like Cristiano Ronaldo selling vegetables in in Indian market right you will see some videos like that those people are basically leveraging AI they are consumers as I told you there are two types right consumer and developer so those people are consumers who are using AI tools and products to kind of automate something we are not going to be I mean we will be consumers for sure but we are kind of looking towards understanding how to develop these things right so we are looking into a curriculum where we can be developers not consumers consumers are non-technical people right okay you have something to ask you can ask but what's happening under the hoods nobody understands right if you go and ask somebody randomly they will not be able to answer but you will be able to answer after this Workshop What's happen what what is happening under the hoods okay so AI is not new like generative AI is new but AI is not new I studied AI in 2011 and I recently realized that honestly speaking AI was one of my subjects in my btech it was a short subject in my sth semester like just few years back five six years back uh me and my friend were discussing about Ai and all these things and then I recollected oh shit I I actually had a chapter a module on AI long back in 2011 so AI is not new data science AI these are not new they are Concepts coming from long back neural networks comes from 1960s okay they are not new it's just that it became famous very recently and generative AI started literally 2 to 2.5 years back the revolution but the conceptualization started in 2017 2018 okay we knew like we I have already used generative AI like 5 years back or 6 years back but it was not called as generative AI back then we were calling it on technical names nowadays we are calling it as generative AI okay so AI is not new it's it's just that generative AI is new predictive AI is not new right we use Alexa what is Alexa you are telling something Alexa is responding so what's happening in Alexa Alexa or Siri or whatever it is right so so let's say Alexa sits here right now this is you you are asking something Alexa how is the weather right and and you you are basically giving a voice from this voice basically what you're doing is data is converted into text okay after the text conversion inter internally what happens is there are Concepts like word embeddings or vectorization so textual data is basically converted into numbers okay now numbers is a very layman's term whenever somebody asks you don't tell numbers unless you're explaining somebody who does not understand anything but always use this technical term called as vectors okay vectors we statisticians don't call average we say mean right so we have to stick to these jargons so we call it as vectors and this process of converting textual data to Vector is called as vectorization okay or simply we call it as word embeddings okay now after these vectors are created then basically you are calling Alexa now Alexa is nothing but a application a model that simply takes Vector data and tries to go through you know various sources websites and various other sources and it is giving you output uh why people are not able to join Karthik are you on the call yeah I'm on the call okay okay I saw your message on WhatsApp oh I forgot to delete it sorry okay it's okay no issues okay simple right so this is not new Alexa is not new right Siri is not new it it's there since ages right Google Maps various various other things these are not new predictive AI is not new generative AI is new so what is generative AI generative a is nothing but generation of of text images videos and so on so if I talk about generative AI generative AI is divided into multiple parts right one of them is a text generation one of them is uh I would say audio video pictures or images right when it comes to text examples chat GPT or if you want to create a bot uh here you will be able to see multiple options like cloud cloud a cloud whatever you want to call you can call Cloud a or open AI or uh what else uh gp4 is open AI llama llama is by meta Facebook right um um I I forgot the alibaba's it's a very complex name Alibaba llm what is that uh yeah quen quen so quen and then we have mistal we have mixl we have zapire um we have jini uh we have bad so many many things are there and there are many hundreds and hundreds of hug pH Transformers based on textual data and these are all nothing but brains like trillions and trillions of data being provided okay when it comes to audio and video yes there are also some applications like Sora Sora does video generation images we have di Okay so these are all generative AI we'll majorly be focusing on text because audio video and images are again similar Concepts but uh they are again they're not that famous right now eventually in next couple of years they they will be famous but right now in Industry we are mostly using textual data textual generation industry level use cases so I'll majorly be focusing on textual part okay so how generative AI works right first of all when you're asking a question so the question was here right so this was your question now if I ask you what are the most important points here like from a human point of view what do you think are the most important uh points here or context what is the context can somebody write down what will be the context in this in in simple terms just just write out write down in the chat box the context is creating a video right the context is creating a video on which topic what is the topic here what is the topic in this prompt feature encoding right simple the topic is feature encoding okay fine topic is fine what is the format format as in which format you are expecting the answer to be so format here means it's basically a script and it should have bullet points right video is ultimate goal not talking about video right format is 5 minute script with bullet points right so whatever output you should get you should get a script it should have a hook so hook is important and a question is important right and you should have something in bullet points let's check whether we have it or not so you can see the hook here is have you wondered how machines understand human language or images the secret often lies in something called feature encoding right this is how usually people start YouTube videos right and that is hook hook basically helps audience to stick to the video so that the attention is higher right and you can see further all the points are in bullet points so it's it's intact with what I'm am asking for I'm also asking for a question to end the script with the question let's see if we have yes there is a question what encoding methods have you used in your projects and what challenges did you face right so the better way of asking like the the more context the more information you give in the prompt the better output you will get right if you give a prompt like this give a uh script idea on feature encoding it will still give you some ideas or give a script idea for my channel like now if if somebody asks you this thing a human to human level information person a is asking person B give a script idea for my channel who are you what is your channel on what is your Niche are you creating videos on programming or data science or funny videos or comedy videos what is the context the context is missing here right how big the script you want to be 10 lines 20 lines 50 lines 100 lines so a lot of things are missing right so even if you ask such question to a simple simple user like normal human being normal human being will also not able to understand right what what what is a script I mean how how big do you want what is your channel Niche many things are missing right so if you give such prompt to Chad GPT and you expect Chad GP to give you a better answer or a better generated answer then you are highly mistaken here if I just simply ask this the basically here it is giving you some random answer now it is still giving me a title related to machine learning the reason being it is taking the previous information into his memory history it knows that this person asked the previous question on feature encoding that's why it gave me a topic related to Future encoding which is machine learning because the historical data was there in his brain if you ask this same question in a new thread let's say I'm asking in it here you will see it will give a complete random answer it's it's giving me on voice based classification C because I'm actually logged in so it's basically understanding that I have my thesis papers on these topics that's why it is giving you this maybe if you you you go and check yourself you will not be able to uh see the same topics it will give you some random topics right because here the model is not able to understand the context and this scenario is called as hallucination right you are hallucinating you don't know what exactly is the ask okay so coming back to the topic okay to learn all these things yes some basic prerequisites are needed um that's why we will be learning about some NLP prerequisites and some Transformer prerequisites okay yes Hook is a thing designed to catch people's attention right normally you see videos right people usually start with a shocking uh one liner statement right now if you start your video with um hello thanks welcome to my channel that the way I start right and that's why I don't have good views people who start with a good hook have good views right so yeah I need to work on my uh script skills as well but yeah it's okay NLP is a prerequisite what is NLP so far anybody has any questions if not I'll proceed I have just talked about some basic things I will be covering the art of writing prompt we will have a prompt engineering session tomorrow um different types of prompt engineering techniques everything will be covered so don't worry about it in case you have any questions uh leave something in the chat box I will respond um moving on to NLP okay so NLP is nothing but the idea of transforming free form text into structured data okay so natural language processing in simple terms right now I'm talking in English right what you are able to understand the way I am talking right and that's because you understand English I understand English you understand English so we are able to talk so imagine in in this group assuming nobody is coming from my state so if I start talking in odia nobody will understand right I mean just ignore the outliers I see one or two people from odisa but imagine okay let's say I'm talking I'm talking in Spanish you will not be able to understand right why because because there is a gap of understanding so similar to how humans interact machines also need to understand right if I say Ram is going I know Ram is a noun and going is a verb but how will computers understand and that is where NLP comes into picture computer needs to understand what you are telling right let's say uh create a feline paragraph about Ms Tony okay this is a uh simple script or simple paragraph on Mahendra Singh dhoni now imagine okay randomly you go and ask some somebody from Spain or somebody from the Europe about Mahendra Singh doni they will not know why because their brain does not have information about Ms donon if you ask an Indian of course you will get the answers if you ask a cricket lover they will give you the answer who Ms D is why because Ms D's information is already in your head if you ask somebody if I go and ask my colle League who is a hong konger about MSI he will not be able to answer because he does not know MSI right so what I'm trying to say is when you work on these kind of models especially NLP related models when you're are passing the information here he who is he I understand you understand he is Mahendra Singh dhoni but how will computers understand so computers need to be understood and this is where NLP comes into picture and in order to understand and deal with NLP there are some prerequisites okay first of all what are the applications of NLP NLP applications are language translation okay you go to Google uh uh so detected as Spanish you are able to translate it into English how are you right I'm fine now how is Google translate able to respond back because Google translate already has information about all the languages thousands and thousands of people are already using Google translate on a daily basis so all the translations is being fed to that memory that brain so tomorrow or you're going and asking something how are you it is able to answer because it has been trained with multiple languages but forget about Google translate imagine we are creating our own thing let's say my my translation sat translation okay or sat translate I'm creating my new tool today I have all the information about English and potentially Hindi I don't have any other languages so if I ask how are you of course I will be able to translate it in Hindi right but if I ask in Telugu or Spanish or French it will not be able to ask answer because my model does not have that information right so information is needed so language translation one of the most important applications of NLP chatbots currently right now generative AI is very famous right in Industry level use cases in Industry level applications trust me I have worked on more than 4 proof of Concepts in the last two years some of the applications might already be running on my AWS account this is a application I did for a French company GE veroba let's say I will ask what is GE vnova sustain sustainability goals it will be able to answer from some documents and the document is also visible to you this this is like a chatbot right trust me right now especially in the field of generative AI if you ask me what is that application which is being used most of the times or most of the companies are using it or doing trials on our chatbots so generative ai's major application right now is chatbots out of the 40 proof of Concepts I have built in my my current company almost six or seven we have productionize that means we have pushed these models into production in in in in client's uh production environment out of this 6 to7 except one everything is a chatbot because companies are exploring chatbots right now so one of the most important application of NLP is chatbot even one of the most important application of generative AI is also chatbot so it's a custom chatbot chatbot built on your own documents or chatbot built on your own question and answers document or chatbot built for your own website okay so there are I'll I'll talk about multiple generative AI use cases so that you will have understanding okay another application is text summarization okay I have a there is a online tool even uh we also have I might show you something um um oh difficult to show some so many folders are there it's okay forget so text summarizer let's say YouTube summarizer so there was a tool I was using I think uh I forgot the tools name let's say we'll use some of these tools okay let's say I'm going to YouTube and let's say I'm going to my channel and let's say I'm taking a video let's say okay let's say I'm I'm I'm I'm I'm sharing this copy this so what's happening here is it's simple right if I ask you you'll Pro probably be able to explain me what's happening is I'm passing my YouTube link right internally what's happening is audio is being extracted okay and audio data looks like this right it's a onedimensional data can somebody tell me in this onedimensional data what is my x-axis what is my y- Axis just guess if you don't know it's fine don't Google it in audio data what is X what is y x is time um sound it's not sound but yeah I mean see again in layman's term it is sound but don't talk in Lay term yeah why could be frequency or amplitude amplitude is a better uh it's not frequency sorry my bad it's not frequency it's amplitude frequency is nothing but 1 BYT right the formula of frequency so um this is amplitude Okay okay forget about it YouTube when I'm passing the link I I'll show you that tool also what happened okay it seems to be a scammy tool uh they will ask for payment and all those things forget about it now what's happening is when I'm passing the YouTube url I have a project on this I'll not be able to show it tomorrow or let's see if I have time I can show you um but I cannot do a walkth through because I can do work through only on one project if you tell this one we can do a work through on this one but I have another project on mind as well so you're extracting audio from audio you're extracting text right and then so the text is huge huge text right because from my 10 minutes video I could have spoken like 500 lines right and this is where an application of NLP or Transformers comes into picture and then you are able to get all the important information okay so let's say it's something like this let's say I have I'm I'm taking information about Sachin Tendulkar can you summarize in five lines so here I'm directly using generative Ai and I able to get the final outcome right so here exactly that is happening you are getting the final generated output or summarized output just to show you guys I have used chat GPT so I have used a Transformer chat GPT is nothing but a Transformer okay but as I told you without generative AI also with traditional NLP techniques you will also be able to summarize it okay it might might not be the best output but you will still be able to do it right before generative a we were using NLP now yes we have generative AI so we are directly going and using generative Ai and getting the output things are little bit easier these days because of generative AI right so but this is how the flow is okay so very important sentiment analysis what is sentiment analysis review use and all those things uh can you uh unmute and ask what do you mean what's the difference what is the difference between this NLP and Transformer then yeah yeah so NLP is like just imagine it as a 10year old boy okay and Transformers imagine it as a 20-y old boy more matured more data is already it already is trained on a multiple data so this is like a pre-trend model more matur model that's why it is able to give you better output okay so NLP are traditional techniques before the generative AI came into picture we were all using natural language processing you know uh some old Transformer models like B models and all those things but now we have generative AI right that's why people have forgotten all those things but as a core con concept it's needed you need to know that's why I'm teaching you okay slowly slowly once we understand uh various other topics your understanding will be better okay why generative AI is giving you better results I'll give you a comparison as well speech recognition yes good application of NLP right Alexa you're telling something speech is recognized to text and then text is being interpreted fraud detection can also be done using NLP some of the important NLP libraries are nltk which is nothing but NLP toolkit we have spacy text blob and various other things okay so first hour has been good now the next one hour we will be talking more about the NLP Concepts okay there are various NLP Concepts that you need to understand so we will be talking about NLP Basics okay so what are we going to cover here I'll give you a brief idea firstly we will be talking about tokenization anybody has understanding in my uh uh you know this thing just let me know okay so instead of writing I can use uh text box after tokenization we will be understanding some Concepts about stemming and lemmatization stemming and lemmatization okay then we will talk about stop word removal um liation stop word removal yes then talk comes the most important thing which is vectorization somehow number four point and number five point are very similar to each other we will talk more about word embeddings okay now these three are nothing but pre-processing steps in machine learning people who knows machine learning you do data cleaning right initially your data is not clean right there must be missing values there must be some redundant columns there must be some n number of issues and you are eventually solving those issues so that you can have a clean data for your model building purpose and that's exactly what is called as pre-processing steps right so in NLP these are the few pre-processing steps we will be talking about this and of course in word embeddings we will go quite in depth word edics are majorly of two types one is a account based where we will be studying about bag of words we will be studying about TF ID F we will be studying about glob I will not cover because we don't have time so bag of words and TF IDF I will cover glove on a very high level I will tell you what exactly is glove okay and another technique is prediction based technique where you will be studying word to Vector okay word to Vector also has different flavors CB and Skip GRS but majorly we will be covering what to Vector uh in general what to Vector okay and I think that's it and then eventually after NLP we will be studying about Transformers okay so let's take a quick break 7 minutes break okay so I hope people are back uh let's move on with the next topics so the first topic is tokenization okay tokenization the simple explanation is dividing a particular text sentence or paragraph into individual words or sentences so there there can be sentence tokenization there can be word tokenization okay imagine you have a paragraph of 10 sentences sentence tokenization will be you will have 10 different tokens 10 different sentence tokens word tokenization basically means you're dividing one sentence into multiple words right but the idea is very simple tokenization is a important and the fundamental step in NLP the very first step that we do is tokenization because only once the tokenization is done then comes the next steps for stop or removal lonization stemming blah blah blah and then you prepare your data word embeddings and then you create your model right so first is tnation I'll quickly jump into the Jupiter notebook so that one second okay so the first thing is I'm importing nltk then let's say I will start okay I'll do it in the same line from nltk do tokenize import what toen ni uh what else do we need um uh uh uh send token okay uh cannot import name and token send token or send tokenize send tokenize send tokenize my bad okay from nltk this is the library this is the internal method and and these are the different functions that we have right word tokenize in sentence tokenize so what tokenize basically does the word tokenization s send tokenize does the sentence tokenization okay let's say one of my sample text will be let's say text is equals to nltk provides powerful tools for tokenization okay this is my sentence if you want we can add one more sentence I'll say also it includes word token tokenization and sentence tokenization done that's my text if you want to print it you can print it that's my text right simple string object the type of txt is string simple now if you want to do a word tokenization simply what I will do is I will just pass word tokenize and I will pass txt that's it you see we are able to get all the tokens nltk provides powerful tools for tokenization right now let's say I have a text two which is boy is good girl is good both boys and girls are good okay now in this you can see boy is good girl is good see comma is also coming as one of the tokens right so but this is a this is a mistake I mean this is not good right because you cannot really pass comma to a model or is to a model because these are very very common words right we'll talk about stop word removal later but yeah this is the simple concept of uh tokenization let me delete it we'll take the just one example of txt similar to word tokenization what I will do is I will do sentence tokenization so instead of word I will just use sentence tokenization and you see that there are two tokens which is sentence one and sentence two okay jumping back to the next topic tokenization is understood next topic is stop words removal very important topic because um very generic and important topic okay first of all try to understand what are stop words stop words are commonly used words in a language okay why stop words removal is important we need to see first of all stop words what are stop words let me give you first examples okay some of the stop words are where is my mirror board yeah some of the stop words are the is r a all these are stop words okay I am and we need to remove all these stop words why what do you think give me an answer what do you think why we need to remove all the is imagine you you are working on five sentences right the is potentially present like five or six times is is present around 15 times a is present around 18 times so why do we have to remove these from the Corpus or from the original data the reason is very simple because these are very common words it needs to be filtered out during the processing because it's not important right the is not important is is not important are is not important they don't provide any context there is no meaning I mean there is potentially meaning but you can ignore it so stop or removal is very very important okay so again we will jump into practicals directly on how stop W removals can be done basics of NLP I'll potentially make it stop what removal yes it's not noise um it's yeah you can call it as Noise Okay so let's try to talk about stop water removal analytic is already import but I'm just doing a good practice because initially I thought of creating different workbooks but as these are short lectures I'm using the same one that's why I'm just importing again okay in case you want to divide this into different notebooks you can divide it so from nltk I'll be using a corpus stop wordss the stop wordss basically one second so nltk Library already has a corpus of stop wordss so we can check nltk stop words list so nlk nlk is a library right somebody has created this library and they have already created this Corpus and this Corpus is of 127 words right what are the stop words myself my we ours ours our ourselves you yours blah blah blah right these are unnecessary keywords so I'm just importing that particular stop wordss um function and I'm also importing word tokenize okay and let's say I will use input sentence let's say sentence is equals to this is a sample sentence showing of The stop words filtration okay this is my input sentence okay so first of all tokenize the sentence um stop words okay let's try uh let's say I'm saving it in a new uh variable words equals to word tokenize and per sentence and let's say I want to print my words and this is how it is this is a sample sentence showing off the stop words filteration right now okay how do we filter out stop wordss uh let's say filter out stop words new sentence is equals to you can also do okay before this let me do this thing um so there is another concept called as case holding okay so I'm just making it to lower case so that basically I'm normalizing it okay and then I am tokenizing it so now my token is everything is of lower case okay so all these upper case I have converted into lower case so um forward word for word in words um if word not in stop words. words um I'm not sure say Okay um filter out the stop words um okay so the comma is not gone because comma is not a stop word as you can see that nltk stop wordss list in this list there is no common right there are only certain keywords like this but but eventually you need to remove the comma and special characters so all these things there will be another step of removal of special characters but that is not a part of stop word removal that is a part of pre-processing Step but it's not a stop word right it's just you getting rid of unnecessary characters right but here what is a stop word this is a stop word you can see this is here where is this this is here so this is a stop word is is a stop word a is a stop word these three are stop words and what else sample sentence showing stop words filtration anything else is a stop word so if you want to check all your stop words this is a of the these are the five stop words off is off a stop word yeah off is also a stop word they have written down off as well okay anyway right so this is how your stop removal is done I'm just showing you different steps and then eventually when you start solving a natural language processing based problem statement let's say text classification or sentiment analysis I will show you how each of these steps are used and then we will try to create a simple model OKAY in NLP also I'm going to create a simple model so that can also so be a very very small project but but yeah I mean the complexity will not be too much but still it will be a small project now next topic is going to be stemming and lemmatization stemming and lemmatization okay so what is stemming and what is liation try to understand both of these both of these are similar techniques okay they basically reduce the number of words to their base level or root level okay let me write down removing or basically reducing um words to their root level now you must be thinking what what does this mean right okay so big bigger biggest they are eventually a form of different words right like they are just form of big right how big it is it could be bigger it could be biggest but eventually the root word is Big right similarly run running eventually the root word is run right so stemming and limitation basically helps you to convert all these form of data into their root form okay stemming and limitation they both do the same technique I mean the same goal but they are different stemming is different limitation is different stemming has some flaws liation has some pros and cons stemming also has some pros and cons okay uh there is a question what would be correct sequence of operations to be done would changing the order have an impact okay I'll I'll talk about it okay okay understood converting it into a root form so in simple terms what is stemming stemming is let's say some examples are let's say running or Runner or run it basically converts it into run programming converts into program okay stemming is a simple process it basically removes the suffix without considering the context that is stemming okay now you must be thinking can we have some examples where stemming fails yes there are many examples where stemming fails when does it fail happiness happy Universal or university interpreted as uni uni verse or maybe Universe okay similarly fishing fished will be fish right now when you do these kind of things fishing is different fish is different right Universal is different University is different so when you are using stemming you are converting it into original format right and in the original format happy happy does not even exist not not an not a dictionary word universe or Universe from Universal or university is Universe right so the context meaning will also change right fishing and fish changed into fish so context is changing similarly what is lemmatization I I'll answer your question Hara what is lization lemmatization the core concept behind lization is also similar to stemming it also converts any word into its root form but it is a more sophisticated process it it it basically considers context like stemming does not consider but limitation considers context so limitation is a better step okay one of the examples of lization is good better best are all form of good if you pass these three keywords into stemming it will not be able to answer but limitation is able to answer but what are the so running will be run so what are the demerits of limitation when does it fail when does it fail one very good example is saw saw is interpreted as C saw means whatever you have seen right it could also be a tool right but eventually whatever you have seen of or a tool is misinterpreted as C so liation fails okay went go in some context you have used went that means he has already gone but now you're using go so contextual meaning is different right it it could be misleading understood what is timing and limitation now a very good question in the chat is why all these steps are required what do you think anybody in the chat what do you think why why these two processes are needed see tokenization understood we need to break down the text stop word removals understood makes no sense to use top words so we will get rid of it but why stemming and limitation why can't we use the raw data any any any anything to say see basically what happens is when you so first of all you are cutting down the data from 1 million words to yes exactly dipi dipi has told the right answer reducing the dimensions of data imagine you have a corpus of 1 million Rec 1 million uh words you can eventually cut down into let's say4 million right or maybe2 million I don't know So eventually reduction of data is very important secondly it improves the information retrieval and the processing efficiency is also increased okay so that's why it's it's basically one of the data PR processing step now you might be thinking okay why not 1 million why4 million latency right imagine your computer is a 8GB machine your your laptop and you are running a 5gb game it will be slow if you run a 1GB game it will be faster as compared to 5gb right same concept if you have more data the processing time will be more everything will be more latency will be high so it's always better to reduce so that's why as a pre-processing step we reduce the dimensionality understood so let's go ahead um and do a small exercise on uh stemming and lemmatization now in in stemming also there are different types of stemmers okay I will be taking an example of a pter stemmer lemotion also has different types of lization um what net ler versus types of LTI lizers I I forgot actually uh one of them is a wet what about others so wet we will be using wet so let's again jump into our notebook I'll change it to stemming versus lemmatization okay so what we will do is again importing everything nltk now here um we will be importing um pmer mltk nl. stemmer okay uh no module name stemmer maybe stem okay stem is working so let's say we will start with initializing the stemmer so I will get started let's say stemmer is equals to pter stemmer done let's say I'm providing some sample words words equals to running run runner let's say happiness programming what examples did we take uh running run uh let's a big bigger biggest okay these are the different words now I will be applying stemming technique stemed keywords or words is nothing but this word stemmer do stem of word forward stemmer do stem stemma do stem one second stemmer do stem of words zero run basically the first element is run what about 0 1 2 3 is it coming happy okay so basically uh stemmer stem word for word in words I think it should be able to get and then let's say I will just simply print the stemmed words oh wow so running is run run is Run Okay Runner is my bad Runner is Runner Runner is not coming as run happiness is Happy programming is program bigger is bigger now what I will do is I will start using liation okay so lonization simly uh similarly like pter stemmer we will be using um wordnet lemmatizer and again same steps so initialize the stemmer let's say stemmer is nothing but ler so my litier is created okay I'm passing the same words so nothing to worry about up applying lemmatization techniques so lemmatized words and here I will use lemmatizer dot do stem forward in wordss okay let's try this word net lizer okay word applying uh LTI technique lzed words equals to L lemmatize LTI LTI then lemmatized words my bad so running lemmatization word is running run runner happiness oh something is wrong with um. stem word net lemmatizer initialize the litier uh word net litier um okay uh for run runner running run runner happiness programming bigger okay so let's say I pass a different set of words here um after litier let's say I will pass a new set words new and I will change it here uh let's say I will give some complicated words that we kind of studied let's say good better best let's say nice okay um good better okay my bad then I thought good better best is considered as good but in under nltk it's not so only mice is having Mouse my understanding was wrong let's give some other words let's see who's G GE so basically plural uh words are getting converted into singular format okay so so what we can do is when we are applying these techniques we can apply stemming and limitations separately because stemming is good on certain words liation is good on certain words right and both of these techniques are used to reduce the dimensionalities as simple as that okay now apart from pter stemmer there are different other type of stemmers as well you can explore by yourself but the idea is very simple and straightforward both of these techniques helps you to get into a base form or a root form right so just simply go to nk. stamp package and you will potentially be able to get different options or else simply you can go to chat GPT and try to leverage so just ask give the code for Po stemmer and ask chat GPT this is my code block can you also write down the code for different other stemming techniques maybe I'll show you how to leverage Chad GPT to help in your journey and chat GPT works really good when you give an example because it understands the context better so let's say you are passing this information similarly give code Snippets for different other type of stemmers so it will be able to give you see poto stemmer is there Lancaster stemmer is there snowball stemmer is there and they are internally different from each other minutely okay but overall concept is same right stemmer and lization concepts are same moving on to the next topic stemming and limitation the next topic is engrams another core Topic in the NLP Basics area is engrams and engrams is majorly of three types what is engrams engrams preserve the sequential structure of language aing in context understanding and feature extraction okay and engrams second and NRS majorly are of three types okay one is uni one is by one is try Okay majorly three types uni means n is equal to 1 by means n is = to 2 TR means n is equal to 3 okay now when you're actually saying something let's say you are saying something right you are giving a speech your speech has n number of items right using the engrams you will be able to train a better NLP model now I'll give you an example and you will be able to understand the importance of engrams okay let's say I'm doing a next word prediction okay so my let's say let's say let's say let's say let's say we will take sain tendulkar's example uh yeah let's say we'll take this okay this is my data input data okay how many sentences are there uh sentence one two sentences three sentences four sentences and five sentences right so imagine you are using this particular input for your NLP task right and your question is uh your question is okay I I'll give you a better example you might not be able to follow this how many of you use everybody uses Gmail right now in Gmail when you are writing a mail right for example I'm writing to somebody can you see this the moment I am typing it I able to see some recommendation right what do you think what what exactly is this and how this recommendation is coming any any idea what is coming to your head what's happening so understand this thing that Gmail eventually when you're writing a Gmail Google might be actually using one of the Google's internal models right in order to do these kind of predictions so let's say um I I hope everything is well the moment I'm start typing in evve I'm able to get a prediction right so the moment I'm typing I hope I'm able to get the prediction I hope everything is fine now how we are able to get this prediction so imagine Google has a model right and this model is trained on trillions and trillions of data right and all of this data when you are training this particular model you are dividing the data into NRS now imagine what is engrams let's say Raj is going to the market one engram means everything is a gram two engram means Raj is is going going to to to the the market right similarly similarly like this imagine there are trillions of data and from this trillions of data there would be trillions or quadrillions of byrams right so here what's happening is there must be a Bagram like this right I hope hope everything everything is right there could be another Byram right hope you everything fine right so there are millions of such diagrams right now when you are typ ping in what's happening is first you are using this from your Corpus and then after hope you are taking the possibility of the next words three letters and you check how many diagrams do we have where we have hope and EV let's say one and let's let say there is another one hope everyone hope all everything is something like this right so you basically check how many possibilities are there so there are multiple possibilities and what is the most probable next word so you're basically predicting one word and after that prediction you are predicting the next word and then the next word and then the next word and that is how it is able to give you a prediction and many other people have also used this particular sentence like millions and millions of times okay and that's how Google is able to predict what could be the next word right I hope let's say here I switch from T to O I hope everyone is doing well and you can see when I typed in everyone there is no recommendation maybe there might not be any byrams where hope everyone and everyone something is present that's why there is no recommendation but the moment I'm doing everyone is I'm able to get a recommendation right so in simple terms when we have large Corpus of data we divide data into engrams for better understanding and better model building as simple as that so it it is also treated as one of the pre-processing steps or just a step before creating your final model and and unigrams are used for simple tasks byrams are mostly used for capturing relationship between Pairs and trigrams are used for comp Lex or Advanced use cases but which of this technique is good everything is hit and trial we hit and trial and then we come to a conclusion which is good which is bad okay so that is your NRS we'll quickly go into the next topic because I'm running out of time the next topic is vectorization very important topic pay attention what is vectorization okay let's in in simple terms vectorization is nothing but conversion of textual data into numbers which we also call it as vectors okay so if I take a simple example he is a good boy right now this is my sentence one okay the very first thing that I will do is stop word removal if you don't do stop word removal you can still be able to do vectorization no problem but it is a recommended step okay because if you don't remove stop wordss unnecessarily the the is r m all these words that makes no sense will go to your model and your model's performance will not be good right so imagine you are trying to create a model on Cricket and you have all these random words which has nothing to do with Cricket right it it is just baseless or contextless so there is no need of adding extra data right so stop word removal is important now imagine after stop word removal these three words are gone so now you have good boy okay another sentence is she is a good girl simple stop what removal s w R and then we have good girl right if I have to interpret this into a textual into a vector format good boy will be 1 1 0 and good girl will be one1 how I will explain you this is where the first vectorization technique comes into picture which is bag of words okay now bag of words means eventually there is a bag and inside this bag are different words so in this entire Corpus how many words are there three words what are they good girl and boy right everybody yes so if I have to draw a matrix I will first write down good then I will write down boy then I will write down girl here as a row I will write down sentence one and I will write down sentence two is good present in sentence one yes is boy present in sentence one yes is girl present in sentence one no is good present in sentence 2 yes is boy present in sentence 2 no is good girl present in sentence 2 yes so what is my sentence one vector for at 1 1 0 what is my sentence 2's vector format 1 1 till now everybody is clear on how a bag of words work simplest technique right very simple yes give me yes yes yes basically helps me to understand whether you are awake sleeping or are you attentive okay now one second let me drink some water now the question is is this a good technique of course not this is just one technique of vectorization but is this the best Technique No for sure not it has a lot of flaws one of the flaws that I can explain here is good is present present in both the sentences right is there an importance of good no because it is present in both the sentences so it's not important imagine you have 100 lines of paragraph right every line you have good good good good good good good good it it makes no sense right everywhere there is good so for me good is not important if out of 10 lines one of the lines or two of the lines has good that makes it more impactful right so bag of words fails on this concept to understand the contextual meaning and the importance of certain words somebody's unmuted please mute uh arpit I'll mute you okay thanks so what are some different vectorization techniques see in vectorization and word embeddings I can potentially take a class for 20 hours such a big topic but we will try to summarize and try to understand the most important things okay so what are the other vectorization techniques one of them is TF IDF one of them is what to Vector one of them is g l o v e one of them is fast text let's try to understand what is tfidf okay bag of words is understood let's move on to TF IDF till now any doubts I mean I'm pretty much sure there will be bunch of doubts because many things are moving fast right um if you are not able to ask right now it's completely fine we can uh take questions on the WhatsApp group later now is vectorization and embeddings mean the same yes almost hi yeah I have a out so whatever whatever examples you have given like uh these are like very formal languages but when we do the like analysis of YouTube comments or Twitter words so sometimes people write words in short like for because they will simply write BC Z or sometimes it's in English so at that time like uh how do we deal with it so if you are dealing with English data that means your model needs to be trained on English right because even on Google transl if you ask in English it will not be able to understand right correct so whenever you are dealing with English data imagine in your company the use case is very focused on English data yes so you need to train your model with English data so let's say you are creating a b model like imagine your use case is uh interpreting data or maybe text summarization right yes so uh eventually it's text summarization right so in that case you can actually create a custom B model I will talk about B models in a while you can create a custom B model but data is very critical here you need to train your B models with um you know huge Corpus of English data then only your model will be able to understand else it will not be able to understand so there are there are some pre-train models on on English data as well uh there are some hugging face models on English data but you have to check I I I don't remember the names I can tell you but there are some models English to English right so there are some models already available uh English B is also there somebody has created right imagine you have a bunch of millions of English records English data you can also create your own uh you know custom B model and make it public for people to use it but the core concept is same if you have data you can pass it and then you will be able to interpret else if you don't have that model will not understand right BC is because we understand but models will not understand so this kind of data we need to train right so yeah there are different different is uh I will I will try to show you uh B models and and see and we'll see if we are able to clarify your doubts or not okay thank you okay so let's pick TF IDF and then we'll go for a short break um oh okay before tfidf let me also um clarify one more thing on B B because um I just explained you in a very layman's term um so B the general context is very clear right words converted into vectors right and then you can actually call your models okay when I say models it could be a classification model like sentiment analysis right or it could be anything and this is is where little bit understanding on ML will help you because people with zero knowledge on ML will not be able to understand what's a classification model right so that's why I told you in order to become a gen engineer ml is needed right um so let's say I have three sentences I I'll I'll take three sentences S1 he is a good boy S2 she is a good girl s three he and she both are uh or let's say he and she are good the very first step is token uh tokenization and then stop word removal right so I will just directly go and do the stop W removal so after stop word removal I have good boy good girl and um both are good boy and girl so it's a he and she are good or we can have boy and girl are good boy and girl are good so we can remove and and we can remove this okay okay now from this what happens is you're basically creating your Matrix right bag of words Matrix how many uh words do we have in your bag 1 2 and three right so boy good boy girl sentence one is 1 1 0 sentence 2 is one 1 sentence 3 is 11 one right so the Matrix of sentence one will be 110 Matrix of sentence 2 will be one1 and so on so basically when you are doing a sentiment analysis problem statement right what happens is you have a corpus of data right you have some reviews and whether it is a positive review or Not reviews whether it's a negative or positive reviews positive reviews positive reviews negative right now how do you create a sentiment analysis model here first of all this is textual data right the very first step will be tokenization then stop word removal right then let's say you are going for a b So eventually what you have is instead of textual data you will have vectors right 1 0 1 0 1 1 1 positive something similar negative and so on and now this data goes goes into your model of course you divide 80% and 20% 80% goes into model 20% goes into testing you test it out and then you finalize whether your model is performing well or Not Right But the idea is clear right from textual data you are getting the numerical data and B B concept is clear B has a lot of uh demerits which I already explained quickly jumping into TF IDF and then we will take a short break first of all TF IDF full form is term frequency uh inter document frequency inter document right term frequency and uh in inter document inverse document inverse document okay term frequency and inverse document frequency okay now there are some method mathematical formulas TF formula is um number of repetition reputations of word in a sentence divided by number of words in sentence and inverse document frequency and this is sentence specific okay TF is sentence specific DF is not sentence specific DF is IDF is overall okay this formula is IDF is log of number of sentences divided by number of sentences carrying that word okay looks complicated but uh it it will be easy to understand don't worry so we'll take the same example let's say he she good boy bad boy uh we'll take this okay okay done and we'll do the same step uh removal so the first one is good boy the second one is good girl and the third one is boy girl good okay I will draw boxes one will be a TF box and one will be a IDF box TF IDF so here I will write down sentence one sentence two sentence three here I will write down good boy girl here I will write down good boy girl so sentence one what will be TF of sentence one for good I will do the first one others you should do it okay so term frequency of sentence one for good is number of repetition of word in a sentence so how many times do we have good one time divided by number of words in the sentence how many words do we have two so TF will be 1 by2 similarly TF for sent sentence one for boy is going to be 1X two TF sentence one for girl will be zero right so boy will be 1 by two sorry my bad sentence one and girl will be zero okay similarly can you tell me t F sentence one for good what will be the value TF sentence 2 sentence 2 for boy definitely zero because boy is not there TF sentence two of girl this will also be 1 by two this will also be 1 by two right so 1 by two 0 1 by two so what will be for sentence three good yes 1x3 1x3 1x3 right so for the term frequency everybody is on the same page understood how the numbers are created right simple now we'll talk about IDF I'm I'm using this part so IDF of good so it's going to be log of of number of sentences how many do we have three how many sentences this word is present 1 2 3 so in all the sentences this word is present and it's going to be zero so IDF of this is zero what will be the IDF of boy log of 3x2 and this will also be log of 3x2 right so now if somebody asks you what is the tfidf score of good for sentence one zero right TF of sentence one is 1 by two IDF of good is zero so it is 1x two into 0 is 0 right what will be the TF IDF for the same word good in sentence two sentence to good is 1 by two multiplied with 0 is 0 so in short for any kind of like good has a zero score for any kind of sentences that means it is not important now this was a drawback in bag of words in TF IDF this problem is solving right that means in a paragraph if you have one keyword multiple times it's not important right so good is not important similarly if you ask me what will be the TF IDF for sentence one for boy so sentence one boy is 1x two log of 3x2 whatever the value will be right it's at least it's not zero and that is TF IDF so we have a practicals on bag of words and TF IDF uh we'll try to go through it after this short break I'm back I'm back okay DPI uh Spacey engine genm genim okay genim versions of liation have not been updated spy error is the support is required when I try to install uh have you tried it on Google collab try it on Google collab Google collab has support uh on all these libraries right okay sometimes yeah local python might not um know there must might be issues but try it on Google collab okay okay quickly jumping into bag of words okay now um for the exercise I'll be using um I'm going to use this spam data okay for bag of words uh so Spam ham everybody is aware of Spam ham Spam means Spam ham means not spam right so we'll try to use um bag of words to see whether we are able to detect Spam ham detection in a particular sentence or not okay and in this exercise we will be using bag of words so the very first thing that I'm going to do is I'll quickly write the codes okay if you are not able to cope up with my speed it's completely fine the code will be given to you you can try it out later okay assuming everybody understands python uh Basics pandas and napai I'll be using skn library for uh certain things like model selection esal learn um let's say I will use feature extraction uh text from escal learn. knif base let's say I will be using multinomial NB and from escale learn um Matrix input classification report okay and this is where little bit machine learning is needed people who are completely new to machine learning uh might not be able to follow this but uh I'll try to explain uh in a way which is going to be understood but I cannot guarantee whe whether you will be able to follow this or not uh let's let's see how it goes if you have doubts we'll we'll we'll try to figure this out okay I'm going to import the count vectorizer vectorizer and then um yeah good so numai pandas skill learn anything else oh nothing um the very first thing that I'm going to do is I'm going to read pd. read CSV spam. CSV okay and how does my data looks like my data looks like this right I have category and I have message so these are the messages and these are categorization how many data point points do we have we can see the data points are 5,572 now how do we check the distribution of the classes so just do DF do category. value counts and you will be able to see the distribution so there are 4,825 good messages and 747 spam messages right and our goal is to identify spam right and this exercise is only for vectorization I'm just showing you how the vectorization works with the context of bag of words okay so what is the percentage distribution just do divide with the total Corpus and multiply it with 100 you will see that almost 86.5% is Spam uh sorry ham and 13.4% is Spam right so firstly what I'm going to do is I'm going to convert this category into a new column called as spam and in this category I'm going to apply a Lambda function so that my ham is converted into zero and spam is converted into one okay I'm using Lambda function you can also try using label encoding that is also possible okay so let's say Lambda x if x equals to spam else zero if I do this um do apply Lambda Lambda um something is wrong DF category do apply Lambda X if oh sorry okay now if you see what is my new column you will see it's in the zeros and ones format right so if you just print your entire data frame this is how it looks like now for me categor is not important because I have this new column which is Spam eventually when you're passing data to a machine learning model right you should always pass in numerical attributes because computers don't understand textual data they only understand numbers and that is one of the reasons why we need to convert this textual data into num and in this exercise I'm using bag of words so using bag of words I'm going to convert this in to numerical attributes so I'm going to use the vectorization part okay so um so here let's say I'm going to make it train test split okay again basic things just for people who are complete complete beginners uh quickly I'll talk about it so in machine learning there are multiple problems right classification problem is there regression problem is there here we are dealing with classification okay now what we are going to do is imagine we have a data set right and we have a category here let's say spam or not spam spam not spam spam not spam and so on right we have 5,57 72 records so what we usually do is when we are calling a machine learning model we usually divide the data into training and test so some chunk of data is training data and some some chunk of data is testing data right and then training data goes into the model so the model understands the training data and imagine this is 80% of 5572 right what whatever it is let's say 4,000 records so the 1572 records are going to this model for testing that means I'm only passing this information not this and when I'm passing this information the model is predicting predicting what this part okay so imagine just imagine there are five Rec records okay 1 0 1 1 1 your model is predicting 1 0 1 01 so fourth record the model is not predicting properly so what will be the accuracy the accuracy will be four correct out of five into 100 so the accuracy is 80% that is how we evaluate machine learning models accuracy right and if you confident this model is working fine then on a new text let's say you pass your own text you pass it and then the model will be able to predict whether it is a spam or ham understood in very simple terms this is how machine learning works right so here I will be dividing the data so X train X test y train y test and here I'm going to use train test split and I'm going to use DF of message DF dos spam my testing size should be 0.2 once this is done I will show you how many training data do we have 4,457 okay so it's basically 4,457 and how many testing data do we have it's 1 one 1 five okay got it now just for your understanding I will show you few records and these are the first few records of the training data right waiting in the car blah blah blah okay now what I'm going to do is here I will be creating my bag of words so let's say I will write down it as create bag of words words representation using count vectorizer okay I will call my vectorizer V is equals to count vectorizer X train count vectorizer is v. fit transform and I will pass the training data values and testing data also I we will do v. transform test data okay for training data we use fit transform because ultimately training data goes into model right so it has to be transformed then fit into the model but testing data does not go into the model testing data is only for testing purpose that's why we only transform the testing data okay basic things so after this if you look at your exrain CV you see it's like a sparse Matrix of numai integers right you cannot see this but you can eventually see your original data but you cannot see your CV data right because these are integer format okay so here what I'm going to do is just for checking purpose let's say I'm doing uh two array and let's say one of the records I'm taking and this is how the array looks like right and how many numbers are there in one array the number of words that we have in the entire Corpus right in the example we took the example that we took had three words right boy good and girl here out of the 4,4 57 record if there are like 1 million words imagine right so your vector will be having 1 million Vector data right that is what the meaning of this one is so here um yeah we can also check the vocabulary vocabulary vocab okay and you can see these are the different words and these are the different numbers corresponding numbers against each word and you can see there are so many words right moving on to the next one um so here um I already have uploaded the data into the into the uh GitHub let me check quickly uh we have vocabulary is there and then yeah so I'm directly copy pasting it okay so here I'm second actually I should have directly taken the code um okay here I'm directly calling the classifier okay uh let's make it ml model and which model are we going to use we are using a multinomial NB okay if you want to use something else let's say decision tree classifier you can use up to you so let's say I'm going to use a decision tree classifier so here what you need to do is just simply call your model model equals to decision Tre classifier that's it and then model. fit you have to pass your y x train CV and Y train and here your model is being trained and you can see I'm only training the training data that's it right once the model is ready then you can directly call your model to do the prediction and the prediction has to be done on X test CV okay you are not passing y test y test is not passed what is y test y test is this part right I'm not passing it because the model needs to predict it right if I pass this data also the model will already know this is a spam this is a ham so there is no point in validating the model right so why test is not passed so I'm passing it and then I'm printing the classification report this basically shows you how the accuracy and the performance is you can see for class zero the Precision is 98 recall is 99 F1 score is 99 which is really good if you also want to see the accuracy score you can also check the accuracy score print accuracy score white test wi red and you can see the accuracy score is 97% if you want to round it up you can round it up okay 97.49 is the accuracy score got it then we can test it on a random data point now this random data point is not a part of your data set test on a random data point so let's say the message is something like this up to 20% off on parking exclusive offer just for you right and let's say I'm transforming this message and and then I'm predicting this and you can see the prediction is zero it's basically telling it's not a Spam which is wrong because maybe this type of data is not available in the original data so maybe I'll try to take some data which is similar or closer to what our spam is let's say free message here let's say I'll pick this one not the entire one and see whether our model is working fine or not not hey there darling yeah you can see it's it's basically giving you a Spam because this type of message was already a part of your Corpus so it is able to identify free WiFi at 599 I'm not sure if this will work or not yeah this is not working so but but eventually you understood right the main concept here is not the machine learning model but how you are transforming the data into bag of words right so this is your bag of words technique maybe I'll just write it down as B okay similar to this what we'll do is we'll also create another notebook for looking at the pace of the program I don't think we will be able to complete uh everything in 6 hours so potentially we will be taking a third class um and and I'll let you know when to take that maybe tomorrow evening or something U because we have not started Transformers yet so yeah it's okay I can take extra class no problem so let's quickly go through TF IDF and we will also try to understand how TF IDF works okay if you want the exact same piece of thing you can take the exact same piece of thing and just change it here instead of count vectorizer you can always change it to TF Ida vectorizer okay so let's say I have a code on count vectorizer I need to change it to TF IDF vectorizer if you pass the entire code to chat GPT chat GPT will be able to you know kind of give you the new code with respect to TF ID vectorizer you can see it's literally the same thing it's just that you are changing the vectorizer instead of count vectorizer you are using a TF Ida vectorizer that's it okay so I will not pick the same use case you can do this exercise by yourself potentially I will try to talk about another example um um anything else RTF I forgot TF Vector should be TF IDF vectorizer um yeah same same it's same the name is f is small okay done um okay let's say I'm taking a corpus in fact I will directly take it from my GitHub so that we save time uh instead of writing everything from scratch so let's say this is my Corpus so Thor eating pizza Loki is eating pizza whatever it is I have created a corpus and here I'm just showing you how TF vectorizer works okay I have initiated a vectorizer I'm fitting the vectorizer with this Corpus and then I'm transforming the Corpus okay now transformation is done so if you look at the vocabulary you see that this is how the vocabularies instead of this if you want to use a count vectorizer and just want to check you can also do that just to see the differences between this and that so let's say CV CV CV output 2 and then um you pass print cv. vocabul vocabulary you can see that uh wait uh CV is equals to count vectorizer CV fit Corpus um Okay so in this particular example most of the things are very similar or am I printing the wrong thing no right CV equals to count vectorizer see if fit Corpus let's say I'll try to change this uh Apple is announcing new iPhone tomorrow apple is announcing new model 3 tomorrow apple is announcing new pixel 6 tomorrow for example I'm changing the Corpus so the output is pretty much same right okay let's try to take uh a simple Corpus let's say because simple Corpus the understanding is better boy is good let's say good boy uh good girl boy girl good now this is wrong umow vectorizer okay so okay understood this is the vocabulary so this has nothing to do with the final output the final output is the TF IDF score right vocabulary is fine it's just assigning a particular number to the entire Corpus words so I I was mistaken sorry okay I'll repeat what what what what is the meaning of this what I'm trying to say is when your initial vocabulary is created right each word is assigned to a number because in this particular Corpus there could be 25 or 30 words right so each word is given a number so that's why the vocabulary is same but the final output will be different so that's why you can see the vocabulary is same so moving on this is where we will be calculating the IDF score okay so let's say here I'm calling the vectorizer the TF ID vectorizer vocabulary. getet for Thor and then what is the IDF score of Thor and here the IDF score of Thor is 2.38 what is the IDF score of Apple let's say I will change it to Google uh I'll change it it to uh what else um Samsung so Thor is 2.38 Apple is this and and then let's say Google is everything is same okay let's say we will change it to Apple and apple let me run it and then I will explain okay it's little bit confusing let's say I will have apple apple apple apple so apple is used four times so in this case e inverse document frequency Vector uh it's array of shape n features okay so for all the features it is showing um okay fine close this this okay so these are the different features how many features do we have uh it could be the number of uh total parameters okay okay let's say I will try to print the IDF score now I'll try to change the Corpus name let's say I will make it Corpus one so that we can easily identify Corpus one for now I am removing this bag of words it unnecessarily it will create confusion this one and then IDF and let's say here I will be calling the idea of each feature get features names out um we get features names out right same vectorize dot with transform vectorizer vectorizer V why it is throwing me error vectorizer has no attribute this is weird this feature is already available get feature names outf IDF quick transform yeah I'm doing the same right DF factorer purpose one we. get feature names names why it's not working input features equals To None yeah that's weird okay let's switch to Google collab it's weird right I mean in their documentation it's there but the code is not working factorer V Corpus one transform output V do transform Corpus one yes and then let's say V Dot see it's working so something is wrong maybe it should be problem with my escalar version uh but anyways uh I have switched to collab so that so same okay if you are facing difficulties you switch to collab okay um there could be a difference in the versions but I don't have time to debug that but anyways uh I'll try to copy paste this directly here so that we see the IDF score for the entire corpus now you can see for Apple it is 2.38 okay similarly what I will do is just for our understanding I will make another Corpus and I will try to have apple name more frequent and we will see whether the ID TF IDF score of Apple is still the same or changing okay so I've have defined my second Corpus I just making it V2 V2 uh uh Corpus 2 everything else Remains the Same transformed output two sorry sorry sorry Corpus 2 and then I will directly take this uh V2 word in all featured names okay uh let's say two two index2 V2 vocabulary. getet word IDF score2 V2 um index 2 IDF score 2 okay you can see this right Apple's ID TF IDF score has decreased from 2.38 to 1 point something what about other words let's say already which is 2.38 already has not changed because I have not impacted already right but if you see that Google Samsung and Microsoft these words are taken out right so potentially here there is no Google that's why here you are not able to see any word called as Google right but in the First Corpus Google is there and Google score is 2.38 right please check get feature names is it uh but okay we can try get feature names yeah it's working yes okay so maybe some local uh version issue but anyways good good this is how we need to fix problems right any any issues that we Face we do some research and then fix it up now we have chat GPT in our days there was no chat GPT right we were using stack Overflow so understood right this is how you I will write it down as tfidf collab okay now as an exercise what you can do today is uh the exact use case that I used in bag of words try to take the exact use case and try to create TF Ida vectorizer and see whether the models perform as performance is increasing or decreasing okay as an exercise you can do this so we we'll quickly jump into the next topic we'll see how much can we cover today if not we will be covering uh next class bag of words is done TF IDF is done what to V okay let's try to cover what to V and then potentially next topics like Transformers birth models we will try to cover tomorrow okay so moving back to my mirror board bag of words and TF IDF is understood so we will talk about the next one which is what to W so what to is a prediction based uh word embedding model and it is a very very famous model it's a very good uh word embeding model okay so when somebody asks you in interviews you can potentially talk about uh primitive techniques like bag of words and TF IDF and recently uh what what to Vector you can also explain because this is a latest technique like very new technique five six years back and then it's it's a very good technique okay it's a prediction based U word embedding technique now there are two different versions of it one version is a pre-trend model so Google basically has created a word to Vector model I will write it down as Google has a word to Vector model of 30 billion data okay and with 300 features you can leverage this and try to use this and convert your data into 300 features or else you can create your own model OKAY some of you might be confused here what what what what do you what do we what do we mean by pre-trained and old model right so pre-train model means so Google already has created a word to Vector model okay and Google has used 30 billion data points or 30 billion words using that they have already created a what to Vector model of 300 features okay so in short it's a huge Corpus of data and how many features do we have there are 300 features and this is a pre-trend model pre-trend model is something which is already trained by a big organization and if you want to leverage that model you can use that model now if you have a specific problem statement okay I'm taking the same example sentiment analysis you have reviews you have positive negative positive negative and so on right initially we were using bag of words now let's say to convert textual data to numerical data we will use word to Vector now in this there are two techniques either I use this or I create my own model obviously if you use this the accuracy will be higher performance will be better because they have already created a existing model right so if you go through this Google's work to Vector model what will happen is all of these sentences we will be converted into vectors of 300 100 features that means sentence one will have 300 sentence two will have 300 sentence three will have 300 and so on and once this numerical data is ready you convert this into ones and zeros ones and zeros and you apply the same technique divide the data training data testing data training data goes into model testing data goes for testing and you check the accuracy see anybody has any concerns here if not I'll proceed I'll try to explain what is exactly the intuition behind this so we'll potentially go through uh the code um let's say I'll quickly open today's class we'll try to shift it little bit um so that we can at least finish what two vectors and tomorrow we will see the rest of the topics okay um so let's say I will write it down as word to back uh pre-trend okay so I'm using genim from jim. Models import word to W um and key vectors okay yeah it's running fine in case in your case if it is not running switch to collab okay then I already have data in my um the Google News Vector data is already in my Google Drive um because I had used it um Google Google News Google news this one so this is the pre-train model by Google's okay so I'm just uh checking the path uh how to check the path it's okay I can directly pass the link so let's say from google. collab import import drive and then drive. mount um content slash will not work on the local system right or it works no it will not work so I will try to download this uh how big is this model not sure 1.5 GB okay let's not waste time I'll directly switch to Google collab okay um new notebook so whatever has been taught today or is going to be taught please spend some time today do some practice and come back with questions okay because if you don't come back with question that means you are not serious as simple as that five uh drive. Mount content drive so here what happens is uh the Google Drive will be mounted it asks for your details and then you have to give permissions permission is granted so you will be able to see your drive here now it's not visible because it's still loading but you will be able to see okay able to uh refresh able to see so here basically I have to go to my drive and inside my drives there is this Google News folder I'll have to search where this is Google uh this one okay and this file just right click copy path and here you can easily call mod equals to key vectors. load what to V format and then here you have to pass this um what do I pre train models oh binary is also needed my bad binary true okay I have this model now in this model I will try to randomly call some of the words and we will try to see what are the uh feature values okay because as I told you this data is already trained on billions and billions of data right if you pass any any word let's say um let's say I'll pass human you can see I'm able to get the array representation right and if you check how many numbers are there if you start counting it one 2 3 4 5 6 there will be 300 okay just to check you can always go and check the shape of it right let's say model uh human shape sorry my bad human. shape you will see 300 features because I told you right we are using Google's what to Vector model which is already created on 300 features okay so this is how it did similarly you can also check for let's say man man woman King blah blah blah okay so multiple words you can do similar inside this model you can also find some similar words let's say I want to find the most similar word for a man the most similar word for a man is woman boy teenager teenager girl these are most similar similarly let's say I want to find most similar word with keyboard you can see keyboards keyboard touchpad trackpad keypad qu keyboard stylus right these are the most similar ones similarly most similar with Cricket definitely there should be sports but no there is no Sports could be there but the score would be little bit less cricketing cricketers all these things are there right apart from that you can also find the similarity score let's say you want to find the similarity score between uh let's say man and woman what is the similarity score the similarity score is 766 what is the similarity score of man and let's say python it is very low right similarity score with Man and something else you you can just give it a try you know use this uh notbook and just try it out there is also another method inside this which is called as uh doesn't match doesn't match let's say here you are passing a list of attributes let's say Java um JavaScript um python elephant oh JavaScript okay this is not correct but anyways Java python elephant Java okay this is this is not correct um maybe no so yeah it it has flaws okay um another thing that I would like to show is let's say um uh let's say I will create a variable let's say x is model score of king um this is a very classic example of showing how what to Vector model Works let's say King minus man plus woman what is X x is okay instead of X I should write model do most similar uh with X and model with most similar with X is king So King minus man plus woman the most similar is King then we have Queen so basically this means king minus man plus woman is basically Queen this is what the interpretation is right so so this way you can actually leverage the existing what to Vector model pre-train models and using this you can converge your data into Vector representation and then follow the same step steps just copy paste the bag of words notebook you just have to change the vectorization step that's it right and then call the model n see and eventually for one particular use case you will be able to understand whether bag of words is performing good TF IDF is performing good or word two Vector is performing good okay last topic before closing is how how do we create our own word to Vector model and this is the last topic and then we will uh close it okay already uh 3 hours has been crossed and I cannot speak more so I need a break let's say I want to create my own model and I will take an example um see the goal is very simple right the goal is find the semantics between words that is the goal now let's say I have a vocabulary of five words okay vocabulary of five words now for a vocabulary of five words I need to Define how many features I want to create right let's say in this example I am creating five features it could be 10 it could be 20 also it depends on you right so I'm just taking a sweet example five features and five words okay so what are my five words king queen man woman elephant and this is a a very classic example um the feature one is I will write it down as F1 FS2 F3 F4 and F5 because features are not known right just for Simplicity let's say F1 is gender FS2 is uh let's say money F3 is let's say power F4 is let's say weight and F5 is let's say speak okay now as we are creating our own word to Vector model I need to give numbers right a number should be in the scale of 0 to 1 gender King is male Queen is female man is male woman is female and let's say elephant is male elephant dollars how wealthy a king is very wealthy king and queen both are wealthy right let's say man is somebody like us broken uh still wealthy but not having as much wealth as a king right so let's say 0.3 woman also gender equality so I'll give three if I give point two you might complain why woman is given even less so three elephant has no money zero power the King has a lot of power right Queen also has powers but more power is with King right so power wise let's say I will writing WR I'm writing it as 0.7 let's say man has a power of 02 and let's say woman also has a power of0 2 elephant also has good amount of power like physical power so .5 okay or if you want you can make it one up to you wait let's say king is8 queen is little bit on a weiter lighter scale point 4 man is let's say 6 or8 whatever you want to give and woman is let's say 3.4.5 whatever you want to give and elephant's weight is little bit high so 0.9 speak yes yes yes yes unless somebody's dump right um now what is the vector representation of King it is 11 1.81 right so if I write it down the vector representation of King is 1 1 1.81 what is the vector representation of man 1.3.2 6 1 what is the vector representation of woman 0 3 2.5 and 1 now going back to the this example right King minus man plus woman in this we were able to get yes that the most similar attribute to King minus man plus woman is Almost Queen right almost 73% match so we will do the same activity minus man plus woman so what is the score King minus Man 1 - 1 0 + 0 is 0 right 1 -3 + 3 is 1 1 -2 + 2 is 1 8 - 6 is 2. 5 is 7 1 - 1 is 0 + 1 is 1 now what is actually the Queen's numbers zero match one match point7 almost 4 almost and one is match so almost 70 80% match is almost there right so what two Vector models you can use a prein model you can create your own model as well okay so um that is it let's say I'll I'll try to proceed in the code so here uh what I need to show you is I need to show you a heat map um so let's say I'll try to copy paste the code directly from my workshop folder so everything is shown to you uh so this code basically shows you the you know initial 50 features out of the 300 features the 50 features okay and this piece of code basically shows you um let's say you want to have only the 10 features and based on 10 features I'm taking these words and I'm creating the data frame and I'm going to create a hashmap and you can see this is the not a hash map it's a heat map right it's a heat map so you can see this is the heat map so on your Y axis you see the different words on your x-axis are different features right so it's feature one feature 2 feature three blah blah blah feature 10 right so feature one you can see man has a higher number woman also has a higher number King has low numbers so I don't know it could be like uh the feature could be like whether he or she is a normal person or not I don't know the feature names are not known that's why they are treated as 0 1 2 3 4 right but if you want to visualize your features you can also visualize like this so that is it um in the next videos we will be getting into Transformers so tomorrow first one hour we will talk about Transformers and GPT architectures which is most important for generative AI second hour we will try to get into um the rag pipelines and then the third hour will be the project if we are not able to complete we will try to extend the class that's it I'll keep the call open for next 5 minutes in case you have any questions you can ask me in the meanwhile I'm just downloading this and trying to push it to the GitHub just in case but the call is open you can ask me doubts if you want to so what is B uh but will be covered tomorrow but is a encoder architecture so after NLP we have Transformers okay and all this generative AI is nothing but Transformers so in Transformers there are two parts encoder and decoder encoder takes the input encodes sends it to the decoder and decoder is the one who is generating the final output so B is nothing but an encoder only architecture so there is no decoder there so B is good for certain NLP tasks but you cannot solve everything with B but but has its own limitations similarly GPT which is nothing but generative AI the GPT architectures those are decoder only architectures so tomorrow first half we will be covering uh these Transformers and encoder only and decoder only and then we will jump into other topics uh Google News data I will not be able to upload because it's a 1.5gb uh this thing right but what I can do is I will share this in case you want to download you can download so I will mention it in the read me okay uh thank you all I think it was a productive session um for some of you it was little bit way too technical right uh but this is how it is right Workshop is little bit fastpaced um but but it has its pros and cons but more Pros than cons recordings will be available I'll just have my lunch and quickly upload the recordings I'll in fact create a new course on Zep analytics.com it will be at internal course not uh publicly available I will individ I I will send you the course Link in the uh WhatsApp group uh you can directly join it's free of cost directly join it okay and in that course you will be able to see the videos okay and code anyways I already have it I'll send it in the chat box okay thank you all see you tomorrow bye-bye thank you thank you tomorrow we'll also have a poll on how your understanding on geni was before this workshop and after this Workshop okay see you thanks so we will get started with the introduction to trans Transformers um in the previous uh classes like in the yesterday's class we learned about introduction to NLP pretty much about the basics of NLP tokenization stop word removals many more things which eventually will help you uh not directly but uh somehow indirectly because um these are all Core Concepts right without which uh getting into Genera VA is kind of not recommended so let's go ahead and try to understand Transformers okay now before explaining you anything about Transformers I would like to uh kind of generate something uh let's say write uh 10 line paragraph about msoni everybody understands who aiston is now this is is the text I have okay now try to okay I'll give you 2 minutes can you try to translate it to your local language maybe Hindi or whatever language you know try to translate it in your head don't write it down in the chat box try to translate and after 2 minutes tell me what is the difficulty that you are facing e just one more minute okay can can somebody write it down what what what what is the difficulty or is it first of all is it easy for you to translate if I give you 30 minutes you'll probably able to translate right but immediately difficult now what is the problem that we are facing while translating such a huge message can okay is it's easy to translate for you I mean everything so okay now can you write down the translation I have removed the text uh now you cannot right so see try to understand I mean try to Tech technically understand what is the problem technical problem is when you are trying to translate to Hindi or any local language or whatever language you prefer the problem problem is our brain is not able to process everything in one go so if I ask you okay try to read about this and then I delete it and then give me the translation you will not be able to do because your brain is processing it line by line right because it's a huge text and if I give you one line potentially you will be able to do it very easily next line you will be able to do it easily because our brain is good enough to process sentence by sentence right so when you work on NLP problems like this when you have huge amount of data sometimes times it is difficult for our human brain right similarly background of NLP is the the way we deal with NLP use cases is using neural networks okay that is where deep learning comes into picture right so neural networks is it's difficult to work with neural networks if your model is not good enough right so basically few years back some researchers identif ified these challenges with respect to natural language processing how to make machines understand and generate human languages more uh efficiently or effectively so traditional models basically struggled with long sentences and complex structures okay and based on which the performance went down so there were T like translation text generation and so on and that is where Transformers were created Transformers were literally created in like 6 seven years back I would say 7 years back in 2017 and who created this concept of Transformers Google it was Google brain who created Transformers okay now in layman's term let's try to understand what Transformers are and then in non layman's term little bit technically I will try to explain you so in layman's term how does a Transformer look like so first of all what what problems does a Transformer solve it solves machine translation it solves Q ands systems it solves chatbot problems text summarization um sentiment analysis so in short each and every problem in the space of natural language when we are dealing with text is being solved by Transformers before Transformers came we were still solving NLP problems no doubt about it right the techniques that I have explained you in the last class all the techniques we have to manually do it one by one by one and then manually call the model so in before Transformers came into picture let's say we wanted to solve a sentiment analysis problem so we have all the sentiments reviews and we have positive negative positive negative so on we were kind of first we were doing the stop word removal then we were doing the case folding like converting it into smaller case then removing all the you know irrelevant characters okay I would write it down as data cleaning and then you are doing word embeddings let's say you are doing word embeddings using uh bag of words or TF IDF or uh uh what to Vector these kind of things and once your data is Vector ized you're basically calling the machine learning model for classification this was a traditional approach right now using B I mean okay we'll talk about B later but using Transformers things can be done in a better way more data can be used and better understanding of data will be done by the Transformers now how Transformers is able to do uh such kind of activity what was the idea behind Transformers so first of all Transformers came in 2017 okay who brought this idea Google brain okay literally whatever we are using right now generative AI all these chat GPT everything comes from this particular Source called as Transformers okay so many researchers have actually told that you know the people who wrote the concept of Transformers are like extremely extremely talented it was like somebody time traveled back in 2017 give them that okay after 5 years 10 years something like this is needed by the world generative AI is needed by the world and somehow they wrote the paper it was it was treated like that so Transformers is a revolutionary thing in the space of data science and AI okay now what is Transformers Transformers is simply nothing but a combination of encoder and decoder okay simple English keywords right you know encoder is encoding like you have let's say you have your passport number or bank account number you are encoding it and then eventually you are decoding it right simple okay so in simple terms what is an encoder in simple terms encoder is nothing but a listener okay that means they are skilled listeners like you I'm I'm speaking right now and you all are consuming you all are listening right now you all are encoders so they basically take sentences so so whatever input that we provide that input is nothing but sentences right they are taken internally in encoder they are broken down and then you understand the meaning of the sentence contextual meaning let's say I saw a python on the road python is a programming language these two sentences are different but contextual meaning in the first sentence python is potentially a snake second sentence python is a programming language right traditionally if you add these two sentences python will have python is python right if you use bag of words so let's say sentence one I okay I saw let's say there was a python on the road S2 is python is a let's say language or programming language okay now in this case if you do all the stop word removal and all those things imagine you have Python and Road and then Python programming language right now if you use bag of words what will happen we have python we have road we have programming we have language right we have sentence one we have sentence two sentence one is there a python yes is there a road yes is there a programming no is there a language no sentence two is there a python yes is there a road no is there a programming yes is there a language yes now what is the sentence one interpretation Vector embedding is 1 1 0 0 sentence two one 1 1 right contextual meaning is not understood right so this is where Transformers come into picture because when you feed the data into encoders who are listeners they basically understand the contextual meaning imagine the sentence is the cat sat on the mat classic example who is most important here the object which is cat sorry the subject which is cat and the object on which it is sitting which is Matt right so the most important keywords are cat and Matt so encoder is able to understand that pattern and how does it understand of course technically also I will explain you not quite in depth because this Transformer topic itself takes a lot of time to teach but I will try to explain in um as simple and as fast as possible okay and after the workshop in case you still have any understanding difficulties on any topic you can let me know and uh we'll we'll figure this out now encoder passes all the information all the context that it has consumed resumed with contextual meaning to decoder decoder are storytellers so it's something like now in the class we have Sumit we have ravikumar we have KARK right now whatever I am telling you I'm giving you the input okay you are the encoder imagine you have a friend to whom you have to explain will you be able to explain EXA act word by word to him whatever I'm explaining in the class no right you will kind of summarize it right yes so you are playing the role of a decoder able to understand very simple explanation if you go to YouTube you go to various other platforms Transformers are explained in a very complex way the simplest explanation is this I am providing input you are listeners without asking anything you are just consuming my knowledge word by word you are consuming you are storing it in your brain tomorrow if your friend asks you can you explain me about Transformers you will not be able to take 1 hour but eventually you will take around 15 20 minutes word by word you will not be repeating me right you will be summarizing and explaining so you are working working as a decoder right so if somebody asks you encoder is a listener decoder is a Storyteller they take the understanding from the encoder generates a new sentence and then gives you the output okay so that is the work of an encoder and decoder now this particular concept was tsed in 2017 right we call it as a sequence sequence model input is a sequence output is a sequence right input is what I'm teaching you in a sequence output is also in a sequence but in a different wordings it's kind of a summarization right so after 2017 I mean in fact after uh um Transformers were created there were certain difficulties in in certain areas well Transformers were better than the previous models but we all are part of evolution right we all have evaluated from uh sorry evoluted from our previous instances from apes to human beings right and eventually we are also evolving we are learning new things so people who are born like 50 years back versus you we are better versions right we are evolving similarly Transformers also had some flaws I would not say flaws but researchers started understanding that why do we have a complex encoder decoder architecture let's try to divide it okay so let me first draw encoder and decoder architecture so that we understand theoretically and also uh um kind of also understand the architecture okay so let me draw and then I will explain so basically um yeah to sequence to sequence model came okay the paper was released in 2017 but I think sequence to sequence came in 2014 or 15 uh sequence to sequence networks paper India yeah it came in 2014 and what about Transformers Transformers so it was neural machine translation uh by jointly aligning when was Transformers created not this one uh Transformers NLP created okay some some people said 2016 some said 2017 it's okay doesn't matter uh years does not matter so yeah just uh if somebody asks you can say 2015 2016 okay so initial the first paper was released on sequence to sequence models which was then not termed as Transformers but the name Transformers was given around 2016 2017 okay so this architecture was explained in this sequence to sequence paper which was released in 2014 okay now what happens is so this is how your input passes these are your embeddings okay and then your input basically passes to these blocks this is your encoder block this is your decoder block and decoder also passes it step by step and then your final output goes through uh activation layer many of you might not know what an activation layer is uh it's something related to deep learning but uh again I cannot go in depth into this so just just high level understanding there is an encoder there is a decoder here also um we have embeddings and here also we have inputs okay so basically here also there is some input and then it passes on to the decoder decoder also has an input and then it also combines this with this and gives you the final output so something like this was explained in the 2014 paper okay so you can call it as a encoder and decoder but this ex this this is not how a traditional um you know in 2017 the architecture was changed okay like this is not the actual architecture of a Transformers that's why I told you in 2014 it was not called as Transformers it was rather a sequence to sequence model okay now this particular model was introduced and also it has some flaws the flaws were on larger data this type of model or architecture was not working properly the accuracy was very low and that is when in 2016 2017 there was another revolutionary thing where Transformers were introduced okay and how does a Transformer look like like uh so it is something like this let me take an example of a translational project let's say we are trying to translate uh turn of the lights and what will be the translation of this uh let's say if I do it in Hindi um light B right right so this is the translation in Hindi those who don't understand Hindi it's completely fine I'm I'm just translating English to Hindi that's it okay so the architecture looks like this let's say we have something here we have something here the input is basically coming from here we have an embedding layer so these are your inputs turn uh of the lights and here you have different Transformer blocks message is being transferred so when the first word comes to the model when the first mod word comes the model only has turn then the model has turn off then the model has turned off of the then the model has turn off the lights as simple as that okay this is how the data is passed from here okay and then this is your decoder architecture and let's say your output of decoder is something like this right light B so we have an encoder we have a decoder from encoder data is get going and getting injected there are different layers in decoders okay there is self attention layers and various other things I'll try to explain what is attention and in in short uh but this is how the data flow happens okay now what what exactly is this this is basically translation of this part right turn off okay now if I talk about different layers in simple terms to predict this do we need the entire sentence yes or no to predict this do we need the entire sentence no right this is called as attention that means to predict light you just need need this instead of all four right so the model understands that this is the attention the model needs to have attention here so for light basically you need let's say I will write it as X1 X2 X3 and X4 so to predict light you basically need X4 rest others are irrelevant to predict this you basically need X1 and X2 so this is basically the attention mechanism based on which Transformers are created okay so you don't need the full sentence right you just need the partial sentence of it now mathematically speaking the formula is so let's say for light right so for light let's say I will write it as let's say C1 and write it as C2 so basically the formula is mathematically the formula is uh um Li J hi so hi is basically this part so let's say for C1 you are taking um L1 L2 so L1 H1 plus L2 H2 basically these two part so there is a um I know this is where you know deep learning comes into picture it will be difficult for as to understand this if you don't understand deep learning it will be difficult so let let's not get into this okay just understand the basic flows uh so that you will be able to answer at least what is attention what is Transformers and so on right now so this was Transformers now now Evolution again uh as we are part of evolution there were other things as well so some researchers started analyzing this architecture separately some try to analyze this some try to analyze this separately and that is where you know we had two different versions of it one of them is called as a encoder only architecture okay and there was decoder only architecture okay because encoder has a different task and decoder has a different task so people researchers started analyzing this entire Transformer architecture into two different components okay and this basically led to a development of encoder only and decoder only architecture now what are encoder only architectures encoder only architectures are models which Focus solely on understanding the input okay so of course you don't have the decoder part only the encoder part right so input is very important for encoder only architectures so they were focused to understand the input better and encoder only architecture showed some revolutionary results in specific tasks like um text classification um sentiment analysis these kind of problems so for example if you want to determine whether a movie review is positive or negative right an encoder only architecture can analyze the review and provide a clear answer okay so what is a decoder only architecture in decoder only architecture you basically don't have the encoder part to the decoder what goes into input is the output of encoder okay this is how the architecture of encoder decoder looks like now in decoder this part is not there So eventually this part is not there so output is there which is fine which is understood that decoder only architectures are good at generation of text similarly encoder only architectures are good at understanding the input okay but fine good at generation of text but what will be the input here right there is no input so one second just give me one second e e yeah sorry I just got a call okay so understood encoder only architectures are good at understanding the input so they are good for tasks like text classification sentiment analysis and so on deco or only architectures are good at generation of text but my question here is like when I first studied about decoder only architecture well the first question comes to my mind is if this is not there and this is not there what will be the input to this model there is no input right there is only output but what will be the input so there will be a input here which is basically called as prompt and this is where the new stream of learning started which is prompt engineering how to write better prompts okay so in simple terms decoder only architectures are designed for generating text and they are particularly useful in applications like chatbots so I will try to copy this part and try to write down uh use cases so the use cases are chatbots um generation of text so creating articles generating uh scripts these kind of things okay these are the use cases of generative like not generative but decoder only architectures okay so now one question might also come to your mind that okay this is good at something this is good at something we already had a combinational model which solves everything right why do we have to divide these architectures does that come to your mind if yes the answer is very simple right imagine person a who knows a particular skill X X1 one and a person B who knows a skill X2 and a person C who knows both the skills but he's not a pro in both of them right so usually we will like from a for a lay man's point of view yeah people will think yeah we should go to this person right because he knows both and there could be situation where both is needed but now I will tell you that there is no situation where both is needed now to whom will you go for X1 you will go to a for X2 you will go to B right and that is what it is encoder only architectures are perfectly designed for understanding the input and analyzing the input and decoder only architectures are designed specifically for generation of text okay that's why if you want to pick these kind of tasks you can go here here else you go there now what are the different type of architectures under encoder only and what are the different architectures under decoder only under encoder only this is where BT is there somebody was asking about B yesterday right b b and different flavors of B are also there we have Alberta we have robera we have distill BT okay and um and there are some mini versions of it mini Albert mini robera mini dber many many are there and they are all different versions of but so if you understand but more than good enough because all other things are just manipulations of B model okay it's some it's something like people have introduced cricket and these are different versions of cricket 50/50 2020 tests T10 blah blah blah right so if you understand what Cricket is good enough right and later on literally you can study about these three about their differences even on chat GPT 1 minute of read you will be able to understand the differences now what exactly is but first of all let's try to understand okay uh what is a decoder only architecture decoder only architecture is GPT this is where the generative AI comes into picture right generative pre-trained models okay so let's first uh try to talk about B just tell me yes or no do you anybody here who knows but just yes or no okay everybody is a no okay uh note it down but is is nothing but let's try to write down the for the full form first and from the full form you will be able to understand a lot of things by directional encoder representation of Transformers so I will write down T in capital to give respect to Transformers uh this is B now can somebody just a quick guess from from the full form right what does it mean I mean any any idea now focus on this part okay bir directional bidirectional basically means when you are interpreting a paragraph or a sentence usually from a human being point of view what we do is we process the text from right to left sometimes we also process from right sorry from left to right sometimes we also process from right to left okay and when we process text from right to left it's not literally like Cricket Indian of History not like that but imagine you reading the first sentence okay Ms Tony referred as msoni is one of the most celebrated cricketers sometimes we also read it from back right history of Indian cricketers msoni okay I mean not word by word but just getting a gist right but in encoder only architecture especially the B model the direction is both ways it is forward and backward which helps in understanding the context even better so that is the advantage of uh this particular model encoder only model and that's why it was created as a specialization in the field of encoder only Transformers now BT is very famous for solving problems like sentiment analysis next word prediction and text classification okay so I'll write it down some of the use cases of bir are sentiment analysis next word prediction next word of course next sentence prediction also can be done um but let's not try it because it's not quite good at that next word prediction yes it does flawlessly um next sentence prediction will be something like it will be the exact sentence that was a part of your input data right it will not generate something out of the box so next sentence prediction is more suitable to decoder only architecture understood so I will not it not write it down because it's not good at that it can still be done using but text classification okay now how does the architecture of B looks like let me try to draw it here so let's say we have X which is your input and then we have the embeddings and after the embeddings there are some encoding techniques so I will write it down as positional encoders and here comes the main Masala which is we have block one block two block three dot dot dot dot let's say block 12 so let's say Transformer block one Transformer block two Transformer block three let's say Transformer block n and then final is nothing but your prediction head okay now I have written down one to n uh this thing imagine here instead of n we have 12 blocks that makes it a BT base model okay so B base uncased model uncased uncased uh uncased model architecture comprises of 12 Transformer block layers with each having a hidden size of 768 so imagine this block is nothing but a neural networks we have input we have hidden layers and then we have output so each block is something like this a a hidden layer I mean a neural network okay so now imagine what is my ex let's say my ex is uh Ali is walking Ali came from the garden okay now these are my sentences that goes into the X okay and this word is basically masked masked means it is hidden okay so just imagine for the model it looks like this right there is nothing here Ali is walking Ali came from the dash and then embeddings are created whatever it is 1 011 whatever the embeddings are created goes through the encoders goes through this Transformer blocks and then comes the prediction head so what does the prediction head does the prediction head basically predicts words okay let's say it predicts Garden it predicts City let's say cool hotel and the probability scores are let's say 6.1 05 uh 25 or if you have more predictions you can add it but this is just an example and as Garden has the highest score you are able to get the final answer as Garden okay understood simple so this is exactly how a encoder only architecture Works H now we will be a little bit fast so in case you have doubts you can still ask me there is nothing to worry about or else if you want time to grasp watch this video Once More once the videos are released and if you have doubts come back to the group and you can ask me even in generative AI Workshop group or any other groups you are part of you can ask anywhere I mean I I respond to questions right so next one is decoder only architectures and example is a GPT architecture okay so we have the encoder from which it goes to the decoder and decoder gives you the output so we don't have this and we don't have this so how does the input get sped there is a new which is called as a prompt okay um there are few technical things here um if you want to read through you can read through I will skip directly to the GPT architecture so positional encoding is something like I'll give you an example so here you can see in this presentation you will I'll potentially remove some of the very technical things which is not needed as of now but if you want to Deep dive of course it is needed um so you can see in this PP I will be releasing this also so after B models there are some different different versions of B models which is robota distal B and so on you can potenti Albert is also there which is a light but um you can potentially use uh chat GPT to kind of uh generate the differences between each of them to understand okay what exactly is is this model so Bas basically if you see uh in in in the BT model there are 12 blocks right and each block is different from each other now if each block is identical that becomes a Albert model as simple as that that is the only difference between Bert and Albert okay and which one will be better nobody can say in in the in the world of machine learning and deep learning we usually go for a hit and trial approach okay if you are trying to solve a um text summarization or a sentiment analysis problem statement then you should always try out different models B models the Albert Roberta and then see which one is giving you best results right because depending on the data set that you have depending on the problem statement different models will behave differently right if there was a rule book that Roberta is the best Albert is not the best then everybody would have blindly gone through robera right it's not like that so there are different but models we have to hit and trial similarly in the the space of GPT also we have different models we have GPT 4 GPT 40 mini GPT 3.5 Lama mistal mixl many are there right we we really don't know which model will work the best in your data so it is always a hit and trial process okay so positional encoding I will I will explain you decod only architectures we will jump which you know I did not explain you decoder only architecture in depth because that's the agenda right now we will be learning more about large language models and various other things which are nothing but decod only architectures so so let's go for a short break we'll be back and then we'll jump into the decoder only architecture okay uh we'll proceed so before proceeding to decoder only architecture uh there was a question right so positioner coding um okay let me try to take the same example uh so that maybe I'll try to explain here itself somewhere here on the right hand side so basically uh see positional encoding in just one line is nothing but it it it basically understands the position right it it stores the position okay so what is going into this Transformer block is nothing but a addition operation from embeddings and this positional encodings right so so so let's say if I have to do the embedding okay now input sentence is Ali is walking Ali came from the garden right and garden is the masked word right um so when we are creating the embeddings right so each word let's say each word is a token okay Ali is walking Ali came from the garden each word is a token right so each word is converted into a vector representation right so let's say Ali is uh Vector a is is Vector B walking is Vector C let's say uh uh again Ali is Vector a c is Vector D from is Vector e the is Vector F and so on right um and and this one basically is let's say Vector M right if you want to remove the stop words you can remove it up to you I'm just giving you a very Layman understanding on positional encoding now talking about the positions so basically Ali is in first position right and then is is in second position then walking is in third position in fourth position there is a dot in fifth position again there is Ali right in sixth position there is this seventh position there is this eth position there is this ninth position and then 10th position there is this right now this embedding this is your vector embedding right or your word embedding right and this is your positional embedding now when you combine this with this is what you get is the word embedding plus the positional embedding and that goes into the Transformer block now it is understood simple okay so if you have such kind of problems right I mean in the future for example um you have any kind of doubts right um try to formulate the doubt in a way which is way which is understood by uh Chad GPT and you can actually rely on chat G without uh relying on somebody to ask answer you let's say for example in the future um I'll give you an example on how to write a good prompt right uh because this is very important now we have chat GPT which can actually help you in your journey it's not 100% accurate but at at least it helps you right if somebody else who is more knowledgeable uh is there with you in your journey of course you can ask them uh so let's let's say you can ask me but if you don't have to rely you can simply use Chad right I'll I'll help you in how to write a proper prompt now for example I'll take the same example okay so let's say Ali is walking Ali came from the garden I have a sentence Ali is walking Ali came from the garden here Garden is must as per my trainer I understand that when we pass the input to embeddings then there is a positional embedding um before we feed data to Transformers block trans forers block what exactly is the positional embedding doing explain me in layman's term using the same example I wrote something like this if you write you will be able to get a proper breakdown you can see each word is taken as a token and then it is explained and then you can see adding the position information and then combining the word and position and then you are getting the answer so you don't have to rely on anybody right you can easily leverage chat GPT but you have to make sure you write it in a way which is understood because the context or the problem is understood by you but you have to pass similar information to chat jib okay so okay jumping on to the next topic which is decod architectures okay so what is a large language model before getting into uh decoder only architecture let me tell you that large language models are there is a wrong interpretation that large language model is generative Ai No large language model is not not generative AI okay large language models another speculation is large language models is a decoder only architecture again wrong so large language models are simply large language models I mean they are language models which are trained on billions and trillions of data okay so they are simply Transformers trained on large Corpus of text Data simple okay it can be a BT model it can can be a GPT model so it can be a encoder only it can be a decoder only it can be a encoder decoder both which is a T5 model which is a sequence to sequence Transformer model so all of these are large language models okay so large language model is not generative AI which is a wrong speculation people think it's generative Ai No right now what is generative AI generative AI is nothing but but you know speculated from GPT which is nothing but generated pre trained models okay now focus on this word called as pre-trend now in the space of AI there are two things okay one is pre- trend one is fine tuning okay so pre-trained is those models that are already trained on huge amount of data okay um how many of you know deep learning just write down yes or no if no then I will not give example Les specific to deep learning else it will not be understood right okay no no no issues so pre-trend models are for every everything in NLP also we have pre- Trend models on deep learning also we have pre-trend models okay so some of the deep learning pre-trend models are uh let's say Inception V5 Inception V4 mobile net so let me show you deep learning pre trained models I'll directly show some examples so that uh maybe comparison yeah rest net squeez net mobile net and these are all some models that are trained by big companies right Google net is by Google rest net is by some other company Mobile net alexnet so there are big organizations who have invested a lot in preparing these pre-trend models and the reason behind preparing a pre-trend model is that tomorrow if a user or a company wants to you know work on a problem statement instead of creating their own architecture which could be very uh resource uh it will consume a lot of resource cost will be involved many things instead of that they can directly use a pre-trend model's brain and feed their own data which is called as fine tuning okay in in simple terms if you ask me what is a pre-trend model um so let's say you uh how do I give an example okay so let's say okay let's say let's say you are this person okay and you have this person so you let's say you are a and this person is B okay and this person is very knowledgeable he has a lot of knowledge on English movies on politics news he has a lot of knowledge you don't have a lot of knowledge on these political question political scenarios right now if you have a question related to let's say ctions in us right can you go to this person right you can go so basically what you're trying to do is you are trying to go to him and use or leverage his brain this is called as pre- Trend model okay now fine tuning is when you are leveraging his model and also providing some information from your side to get a better result that is fine tuning so in the space of AI there are already some pre-trend models now now now in the context of generative AI I will explain you okay not in the context of deep learning so let's say GPT okay we'll talk about GPT 4 now GPT 4 is trained on trillions and trillions of data okay now what you are doing is you you are this person when you are going to Poe or chat GPT or any online tools when you're asking something who is sain Tendulkar write in two to three lines now What's Happening Here is you are directly and you directly asking the brain right so you directly asking a question and GPT is responding you with an answer fine but imagine you have your own document okay so here you are leveraging PR2 end model GPT is already a pre-trend model on trens of data right it has sain's information uh sain son information about various other information news information sports information Wikipedia data stocket Market data till certain certain certain uh year maybe 2023 so it has immense amount of data so if you ask a question it will answer so it's basically like you are directly dealing with this model uh which is a pre-trend model now imagine you have a document okay now this document is potentially about let's say um let's say you you have your own research paper okay let's say you have a research paper or maybe you have uh your notes from your teacher PDF notes what you're doing is you are kind of using this along with this you are creating a better version when you're are asking a question so let's say as a user you are asking a question basically you are fetching the answer from this research paper or your notes okay but you are using this model models knowledge to do the chat completion because for chat completion you need a generative AI model right because ultimately your output is going to be generated so this is basically you are using your own data this is called as fine tuning okay and this is what we usually do in Industry level problems because creating a new large language model from scratch is nearly impossible I mean for small scale companies it's not possible for individuals it's like forget about it right uh big companies who have monetary aspects I mean who have money who have resources who have gpus tpus they can also they can invest and work on large language models right there are different large language models like GPT 3.5 GPT 4 gp4 mini many many uh options are there uh and they have different configurations different token size different contextual Windows llama is there mistal is there different other things are there but just you know if somebody asks you you can actually talk about some of them and I'm pretty much sure nobody will ask you okay what is the difference between GPT 40 and GPT 40 mini with respect to token size or uh contextual window nobody will ask you these questions because these are very uh you know questions very specific to that model right and as a engineer as a gen engineer it's not m mandatory to remember those kind of things it's not needed but generally speaking if somebody asks you can tell yeah GPD 4 is definitely better than 40 mini right because 40 mini is a mini version of it miniature version with less data and less capabilities so obviously 40 is better and if somebody asks you about 40 and Lama which one is better you can say that llama is nothing but a open- source model right so and and open AI is like a API so it depends right depends on your use case so let me quickly give you some ideas on you know what are the different type of models and I'll also tell you about a tabular view which can help you articulate an answer if you get a question related to Which models are best Which models are not right so in in large language models especially in the field of uh you know generative AI there are different type of models right I will divide the models into open source and I will divide it into apis inside apis I will write down private apis and I will write down public apis okay what are some of the open- source models some of the open source models are llama 3 mistal mix zapire and many many hugging face models private apis I would say that some of the apis that are exposed by AWS and a let's say AWS Cloud a cloud a 3 right so these kind of things which are owned by the private Cloud sources public apis are open AI right you can easily create your account and get started now from an interview point of view if somebody asks you that for a specific use case which one would you prefer right the general answer is performance wise it depends which one will be giving you better quality of answers it completely def depends on the use case and we need to hit and tral but if somebody asks you okay what will be your best solution for a proof of concept and how would you like to see the same application getting deployed in a client premises right for a proof of concept obviously we should always go ahead with open AI now the reason being it is very freely like easily accessible easily accessible right you can just create a API key and done you you can easily access the open apas right for posis less less headache right easy easy to use ease of use now talking about if we want to deploy this into a client premises it depends on what the client asks for right if the client is more focused on performance without having something in his mind about monetary aspects we can go for open source models but open source models are usually of 8GB or 16 GB in size right they are physical models available online right if you go to hugging face llama 3 model download you can see that this particular model is what is the size of it wait wait wait wait somewhere it will be here you can see 5gb right but there are two tensors so 55 10 GB similarly yeah so so the thing is if the client has no budget issues they are okay to invest storages either on premises or on cloud they can go for this option but this is a expensive option the easiest option will be open a okay so so it depends from client to client based on what the client needs if the client has no problem about data privacy and all those things blindly go for openai if the client has data privacy issues then also you can go for openai because open has clear data privacy policies no problem but only if the client requests that no I don't want open a uh mine is a very critical data I don't want to take even 1% of risk I'm okay to spend money I want models on my premises then go for this 16GB 24gb models are also available okay so general idea is this so the evolution of GPT happened something like this like Chad GPT was officially announced in 2022 around uh you know yeah you you can see 2023 sorry my bad uh 2023 chat GPT was introduced and now we have GPT 4 GPT 5 was also released right couple of months back publicly it has not been released I guess um or might have released I don't have any idea but all of my models that I'm using right now in my current company or in P's it's GPT 4 and even if GPT 5 is released I will not use it because when openi releases a new model it will be very expensive and eventually they will make it less expensive going further so I have not used it so generative AI uh general idea is you know it is the generation of text images and audios right um fundamentals of neural networks are not needed I'll directly jump into LMS so what are llms llms as I told you are nothing but large language models right they are developed to understand and generate humanlike data now these large models are trained on vast amount of data and they learn the pattern structures and semantics they are capable of generating contextually relevant text by understanding the relationships between words and phrases Okay so we'll we'll go through prompt Engineering in a bit um okay let's first talk about prompt engineering and then we'll go for Rags no problem okay prompt Engineering in a nutshell so what goes into the input of the decod ERS is nothing but a prompt right so there is a way to articulate prompts because the better you write the better the output will be as simple as that right now there are majorly five to six pointers on what should be a part of the prompt right first is Contex nextext okay or topic second is Stone third is formatting fourth is our format fourth is examplar or examples okay so what is topic topic means the cont text what exactly do you need right now if I ask you randomly okay can you tell me what happened in that high level design that we discussed yesterday well you have no clue about it right because I mean you are not part of that discussion which high level document are you talking about right the followup questions from your side will be which high level document are you talking about we did we discuss yesterday right so in a human to human level of discussion or um language uh you know in a discussion like this right if you don't understand there will be follow-up questions right but when you are asking a question to generative AI let's say chat GP now chat GPT is not a human right it's brainless chat gbt will still answer you you can see I will I will probably write down um can you explain what are the key areas of the hldd document we discussed see see it it is randomly giving me some answers I mean it has no context it is it is basically giving me random answers right so chat GP is brainless right if I ask you the same question can you explain what are the key areas of the hldd document we discussed you will not be able to answer there will be questions sir which document are you talking about so topic is very important which is nothing but context right tone is important how do you want chat GP to to answer to answer humorously to answer in a professional way to answer uh like a friend something like that so tone is very important formatting is also important how do you want to format whether you want bullet points whether you want you know some ending closer points how examples if you provide examples it is even better right so let's say I want to ask a question okay write a YouTube script it is still able to answer on some random topic which is is five tips for Effective management now in this script Nothing is there there is no topic only YouTube script but there is no topic on which topic right I have not defined the format I've have not defined anything but instead of this let's say I will give a I will give a prompt like this write a YouTube script for a top pick introduction to gen AI or introduction to BT models explain the Transformer models why bir was created and different versions of BD in this script start the script with a hook and end the script with a question mark try to also include some um lines on how should I edit this video so let's give what I'm trying to say is I want ideas and then also I need to edit that video right so you can see it's able to explain me opening scene energetic background music fades in introduction to BT models hey everyone have you wondered how machines understand human language today we are diving into the most groundbreaking advancements in natural language right so this is a very good script no problem but can we make it better of course we can because till now if I'm doing a video on this particular topic there is no introduction about me right so what I will do is I will give an example one of the examples of a script is have you ever wondered how machine Learning Works my name is satjit pnik and I have over 13 years of industrial experience and today I will be talking about machine learning this is how usually I start my script now let's see so so it is not able to give my name which is fine but it has generalized have you wondered how machines can understand human language my name is s patn and I have 13 years of experience in Ai and natural language processing today I'm excited to introduce YouTu models and how they so you have to write prompts in a proper way right now prompts are of different types or different structures majorly if somebody asks you how many types of PR prompts are there you can tell them there is a zero shot prompting there is a one shot prompting and there is a few short prompting okay now we will also use chat GPT to kind of explain prompting are of three types zero shot one shot and few shot can you EX explain this with examples make it short I don't want a l the answer so this is this is for our reference okay I mean this is not a zero shot prompting zero shot prompting is basically which has no context okay so it's like translate hello to French it's a zero short prompting no examples are provided nothing is provided did right one shot prompting is like you giving some example translate goodbye to French example hello bonjour right or let's say you want to give something like this translate goodbye to uh okay let's say French example hello okay something like this so I'm forcing the answer to be in this format colog right and you can see goodbye in French is colog a whatever it is right but what is few shot few shot means you take more examples give some more examples and then ask so multiple examples are given to help the model understand the task better okay and Beyond these type of promptings there are also few other promptings let me also talk about that so what do you mean by um chain of thoughts prompting technique and also tree of thoughts prompting technique give me an example of both of them so there are okay my tokens are we are out of tokens I guess okay we are back so um so there are few more techniques apart from zero shot one shot and few shot which is called as Co and to okay Co is chain of thoughts and to is tree of thoughts okay so you can see that chain of thoughts prompting technique this technique involves breaking down a problem into a sequence of logical steps or reasoning now for example The Prompt is calculate the total cost if a book cost this and you buy three books first find the cost of book so basically you are giving them the steps first do this and then do that right so imagine I will try to Twig this and then instead of this I will write down one more line that first talk about the Transformers so let's say I twick it I should make sure you start with evolution of Transformers and how bird came into picture and then go ahead and explain the different versions of but now this is basically Your Chain of Thought right you are giving them the input that this is how you need the answer chain so one by one by one okay and tree of thoughts is this technique involves exploring multiple branches of reasoning Alternatives at each step it encourages the model to consider different possibilities or paths to reach the conclusion let's say decide on a vacation destination considers fact like so you are asking them to consider different facts and basically creating multiple scenarios okay that is tree of thoughts what is a tree tree is something like this right okay so what is a tree let's say you are trying to see whether you will be interested in a job or not right take the job or not first you check salary if salary is less than 20 lakhs no I will not take if the salary is 20 to 40 lakhs and if it is in Bangalore yes I will take it if it is in Kolkata no I will not take it and then if uh you know some other things so basically this is how decision tree works right this is what a tree is so if you want a scenario like this you have to phrase it up like this like consider these these these factors based on which create different criterias that is a tree of thought now these are all con conceptual things okay only in I mean only helpful in interviews but to write the best prompt you need not have to blindly follow these techniques slowly slowly your level of writing prompts will automatically be better and better over the period of time when I started chat GPT one and a half years back I was also writing some you know beginner friendly prom I was not able to get proper output but nowadays I leverage chat GPT and trust me I do a lot of things using chat GP although you cannot 100% rely on chat GP because it's not 100% guaranteed I use chat GPT very uh often because I can validate whether the answers are correct or not and then only I pass on the information to whatever the task is right but you should always valid it you cannot 100% reli I mean there are many instances in my life where chat GPT has given me wrong answers now for example one of the task I was doing in my previous in my current company is going through a high level document so there was a high level document and I was going through it my work was basically how to validate you have to check from different sources so let's say you are asking a academic question right let's say on b or maybe whether uh robot is better than Albert or not or whether robot has how many blocks chat GPT says 12 blocks go and valid it in other research papers or in in Google try to see different blogs then only you can validate right there are original creators who have documentations on internet you can validate from there blindly don't believe in Chad so there are many instances where CHP has failed trust me but in order to be a pro in it you also need to validate things and that's how it works and slowly slowly when your you know understanding increases you will be able to validate now I was going through this document and my work was basically to give comment on how this design document is and I already had my comments in my mind I already started posting then I thought how about basically I have created a project on this also which will come on the AI Masters program and also on YouTube very shortly uh the idea is I used this document and the statement of work document to create a chatbot where I can ask anything as a like a expert in this domain so when I ask chat G can you tell me what are the problems in this document it was giving some generic answers that this is missing that is missing and so on but eventually when I give more information to it it it started learning and then give me better answers I when I started creating my comments I basically gave Chad GPT that not chat GPT my own GPT model that the these are the comments that I have raised you can see how technically sound these comments are similar to this try to raise some technical comments don't raise some random comments on this architecture and then it was able to give better results so you just have to you know slowly slowly it will happen it it does not happen on the go right um if you start using chat GPT quite often uh eventually you will be a pro in prompting there is no need to take any prompt engineering guides or any kind of courses on propt engineering there are many books people are Distributing on social media 100 prompts to improve this th000 prompts no need just ask more and more questions to chat GPT and you will see that in next 2 3 months your way of questioning will also change and you will automatically understand because the answers will be in a better format okay so we'll quickly try to talk about the rag architecture which is the most important thing um and then um little bit about Vector databases and I will directly jump into the project and in the project you will have more understanding like how things are are happening right so first of all before talking about Rags let me tell you so there will be no break okay just FYI I don't have time so if only I'm out of breath I'll probably take a break but not uh not like that okay so one of the problem is I also explained this with well in my AI Masters program I mean some of you are a part of it uh you can eventually uh take my example and and take the video and see that um but one of the examples is to understand Rags so let's say I'll I'll go to chat G imagine I'm using vone I'm using Vodafone Sim I have a post PID Sim my monthly Fe is like 5.99 I'm just giving random examples okay but my last month bill was around 3,999 okay and as a customer I am pissed off I'll go to chat GP okay and I will try to ask Charity My Vodafone bill for the last month was 399 rupees why is it so can you please check as I told you GPT models or generative models are brainless now when I say brainless I'm not telling they are bad what I'm telling is even if they don't have context they still give you answers right now don't treat that I'm telling bad about GPT I'm not telling bad about GPT simple right it is still answering right if I ask you the same question what will be your answer he will be like how the hell I will know I don't know you must have done some extra calls I don't know why are you asking me you will tell me the same thing right anybody will tell me the same thing but GP is able to give some answers well these are some generic answers which is fine but at least it is giving me answers right now this is a situation where it is called as hallucination Hallucination is basically a phenomenon in GPT models or in generative AI models where they hallucinate they do not understand the context properly and then they try to generate some answers which are contextually not right okay that is hallucination so one of the problems with generative AI or with GPT models is hallucination common problem the second problem is okay I have a research paper okay let's say I have a research paper now of course this research paper is available on the internet this is my paper but just imagine this research paper is not available on the internet right so of course if it is not available on the internet GPT will have no idea right now if I want to ask something from this particular document can you tell me what is the procedure used in this particular paper GPT will not be able to answer right so GPT will not be able to answer for custom data right now if you go to vodafone.com Vodaphone idea.com uh oh it's not sorry uh Vodaphone idea oh myvi doin so let's say I go here and then ask the same question I go to V phone and I ask the same question I'm a customer right I can ask right I'm asking the same question what's happening my Vodafone bill for the last oh it didn't even take the full full message now as a customer first of all the experience is not good right second of all it is giving me some random results how do I get help now this is where Vodafone can take an action what if Vodaphone comes with a chatbot and this chatbot basically can once you provide some of your information the chatbot will basically have information about your profile your um usage your details which the chatbot can basically fetch from a particular database of Vodafone and when you are asking a question the chatbot should be able to answer if this happens the customer will be happy right now instead of this search I can simply have a chat bot and I can answer I can get my answers right why my vone build is not I'm not talking about somebody coming to the chat bot and answering I don't need a physical person I need a bot complete AI bot which understands your profile your uh you know uh more details about you goes to the database checks everything and gives you the answer and this is where rag concept is related to custom data is very important again in this one if you want to feed your data to a rag to a llm pipeline you need to use rag what is rag rag is retrieval augmented generation one of the most critical topics in generative AI one of the most important topics because if you just keep using chat GPT what are you you are not a developer you are a consumer right go back to the first class I told you right there are two parts one is a consumer one is a developer it's good to be a consumer no doubt but we are in our journey to become a developer right we should build products and how do we build products let's say I want to build a chat box which is pretty much built on my research paper so whenever I ask something I should be able to get results or responses from my uh research paper okay let's say I have I have a research paper in a PDF format okay I'm passing it very Layman's uh understanding let's say I'm passing it to the um gen AI pipeline okay and after that let's say I'm storing the data somewhere I will make a database DB right storing the data and then after the storing is done let's say I'm going to create a front end so I have written down as UI where this is me I ask a question okay what is the question uh let's say let's say my question is um what is the methodology okay explain the methodology explain the papers methodology in two lines okay something like this that's my question now what needs to be done of course I need to go and search in the database what is the papers methodology these are the most important keywords I will go and search and then I will potentially get some output right maybe I will write it down as relevant information right after this relevant information is fetched imagine this relevant information is a it's a like a five six or 10 lines of information right but your output is requested in two lines right so you need to have a proper chat completion right as per what you are expecting so this is where the chat completion is sitting right so this data goes here so this chat completion is nothing but your GPT model or you can write it down as open AI let's say GPT 40 model so this is a very Layman's architecture and this entire architecture is called as a rag pipeline because you are using a custom data imagine you are not using a custom data so you can directly call open a right wow you can directly call openi right you don't need custom data like the way we are calling here right directly and open AI will give you a response if you are doing this then that's not called as a rag pipeline because you are not using a custom data but if you are using a custom data it is called as a rack pipeline anybody has any understanding problem here if not I will move into the more technical architecture any doubt what's here clear just give me one minute I'm getting a message from a student that I have missed to invite her in the meeting uh which is not good just give me one second how did I miss her oh uh is this person on this call right now Asha afarin no right okay anyways uh moving on everything is clear right simple now I will explain you the more uh technical architecture so I already have the architecture with me so that I don't spend time time okay so this is a more technical architecture trust me looks complicated but it's super easy I will explaining okay so let me first delete all these boxes Okay now what's happening is why is she creating e oh wait wait just once e e e uh sorry guys let's go for a like 2 minutes break I'm talking to a student who had a miscommunication so just give me 2 minutes I'll be back e e okay so coming back to the topic so this is a very generalized rag flow architecture okay and any kind of problem that we are solving is somehow related to this architecture okay so let's say we will take an example now what are the examples of uh rag Ed cases so one of the examples is let's say um I'm I'm taking an an notations here so one of the examples is uh research paper bot right that let's say you are a researcher or you know somebody who is interested in a research papers and you are bored with going through the research papers which is very lengthy very technical you want to leverage generative AI to create a chat bot where you can pass your research paper and ask everything about the research paper right so one is this one another one is AI career coach so let's say you want to pass your resume and then you want to create a chatbot where you can ask anything anything about coaching like what are the problems in my resume what are the best skills should I add in the resume I want to learn this I want to learn that so your career coach right number three is let's say um you have a website okay you are basically scrapping the website so imagine you have a bunch of PDFs the website details and then from that you are basically creating a chatbot so when the user comes into the website they should be able to get all the information in the chatbot okay another example is um let's say similarly another example for a website let's say you have a website data on top of that you have a Q&A Excel sheet right and using this you want to potentially create a chatbot so for your users that are coming to your website they should be able to get answers either from the website or from the Q&A Excel sheet that is another project idea right another project idea is your coach let's say you have a bunch of PDFs like statistics PDFs or machine learning PDFs or deep learning PDFs and you want to create a chatbot where you can ask questions from the PDFs right now for this you can eventually use chat GPT because all this information might be part of chat GP no doubt about it but you know sometimes in PDFs there could be some more information right we don't know that's why if there is a need you can also work on use cases like this another use case I have worked on uh in my current company is there was an application called as Bloomberg so the company was using Bloomberg chat they basically gave access to all the chat using apis so we basically fetched these apis got the data and using a rack pipeline we create a chatbot so using the chatbot we were able to get the final output in a structured format so this is also rag pipeline okay now eventually what is common in all the problem statements is data right research paper also you have a data career coach also you have a data website scripting you have data website Q&A you have data data coach you have data Bloomberg chat apis you have data right so data usually in PDF Excel Json any kind of format but data is important right whenever you are working for um working in rag pipelines right you always need to do what you always need to um where I was sorry I'm getting distracted by multiple messages uh when you're working on rack pipeline right you always work on custom data because you are leveraging and training uh in your generative AI module or your GPT architecture on your own data okay so let's get into this architecture quickly we still have good amount of time for the project so nothing to worry about we are on time the first is data right PDF when you are passing the PDF the first thing what happens is PDF is converted into smaller chunks okay smaller documents so let's say document document document document from each document embeddings are created and these are all your embeddings okay so this is your embedded model let's say imagine this is your word to Vector model okay now either you store these embeddings somewhere maybe in a tensor or store it in a vector database or store it in a vector index but you need to store it somewhere this is the storage okay you're storing the vector embeddings now in parallel universe you have a UI interface and you are asking some question now question is also converted into vectors and again maybe using the same word to Vector model and then you're basically calling wherever ever you are storing the original data you are calling and then you are getting the relevant information after you are getting the relevant information this is your openi and openi is doing the chat completion and then you are getting the final output so I will rub the board I will try to go down so that I can explain you in my own style so we have PDF document right and for Simplicity understand about this simple example okay research paper bot okay you're potentially working on a chatbot trained on this particular research paper so you have this PDF document okay how many pages does it have now my PDF document has six pages around 5,36 words okay just imagine okay you have six pages and let's say 10,000 words okay try to understand the very first thing is reading the file reading the file okay after you read the file the very immediate process is chunking and while you create a chunk you have to Define what should be your chunk size okay so there is a concept called as chunk size let's say my chunk size is 2,000 so can somebody tell me how many chunks will be created we have 10,000 words and the chunk size is 2,000 so how many chunks will be created 1 two how many five others what about others five chunks will be created right wrong okay six chunks will be created I will explain you why okay because there is another parameter called as chunk overlap this is also an parameter so let's say I'm defining my chunk overlap as 200 okay now what is chunk overlap and chunk size try to understand in in a very simple format okay let's say uh I'm going to generate write about Ms donon in 10 lines I'm just taking this much okay that's it so okay now this is my imagine this is my PDF content okay now how many words do we have 1 2 3 4 we will ask I don't have patience to calculate 100 three words 100 three words okay whatever it is 100 three words are there right imagine my chunk size is 20 and my chunk overlap let's say is five so what will happen is 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 that will be a part of my chunk one okay then chunk two will be there is a overlap of five okay so 1 2 3 4 5 so from here 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 understood so this is called as overlap okay similarly third will be this this this okay similarly 10,000 words if you are having a overlap of uh 200 eventually first chunk will have the first 2,000 words second chunk will have the last 200 from first chunk plus the second chunk right and similarly it goes on So eventually you will have six chunks okay but doesn't matter how many chunks you have it's all calculated internally right just for your understanding I'm doing this okay after chunking is done for each chunk for each chunk the embeddings are created okay embeddings are created now these embeddings are poten itially getting stored in a database it could be a tensor okay forget about tensor no technical terms let's say it is a vector database or else it is a vector index okay where somewhere it is getting stored right now now there is a front end somewhere here so there is a user like me right I am asking okay can you explain me which database is used in this analysis okay let's say which data set is used right eventually I'm creating a vector and then I referring to this to this database right in this database there are millions of vector data millions of it and here we have just one So eventually what you are getting is the input Vector plus relevant information maybe let's say this one let's say this one this one you're getting some relevant information okay and then you are calling the open AI model or just the GPT model and open AI model is basically doing the chat completion and then you are finally getting the output this is the architecture flow of your generative AI architecture flow what is your input input is this one no so you have a front end interface this is UI where you are asking a question about your research paper tell me about the data set used and this is also converted into vector and then we are getting from the vector database of the vector index now what is a vector index and what is a vector database I will explain you but apart from these two Techni IAL terms other things are clear if other things we are getting the input from Vector database what is that input where are you getting it from Vector database you are shown in the diagram of input and relevant information H input is this one your prompt The Prompt that you are passing and the relevant information and you are get sending it to openi okay so it's exactly like this what is happening uh behind the hoods let's say you are you are passing the input is what is the data set used and the relevant Passage is let's say can you give the final output in just two lines so you are passing the input which is this part and then you are passing the passage or whatever the relevant information is which is this from this part and then you are giving an in put that give me answer in two lines and that is nothing but your prompt this is the prompt template that you are using okay and that is there is somewhere here there is an llm model here like a lang chain model here where you are also passing the prompt and using the prompt prompt is also going here and then you are getting the output and the output is this that you are receiving on the front end okay I'll when we jump into the coding part I will explain you uh each and everything now what is a vector database and what is a vector index Vector database is something like U everybody knows about different types of databases right we have Oracle SQL databases we have my SQL databases we have MS SQL databases we have snowflake we have mongod DB right these are all scalar databases okay when you are converting textual data to numerical data you are basically having Vector data right and Vector data for a huge like the input is huge right so 10,000 words so imagine the vector data will be very complex and it will be in a format of numbers right so there is a need of special database scalar databases are not good enough to store Vector data that's why we have separate Vector databases the advantage of vector databas is fetching data from Vector database is faster scalar databases are slow Oracle all these are slow databases like if you have huge amount of data they will not work right so Vector databases are meant for Vector data and one of the examples is Pine con you can easily go to Pine Cone and learn more about what Vector databases are okay Vector database everything is explained here this one page will explain you each and everything about Vector database okay and what is a vector index Vector index is nothing but a local Vector storage so whenever you are creating a project let's say we will be creating a project now we will be using a vector index so the index will be stored in my local machine so let's say this is the project that I will show right now and you can see the vector index is here so it's a local storage now if you want to use this chatbot you need these in your local system else it will not work right but when you're using a vector database Vector database is public right if I'm using a vector database you can also use the same chatbot because Vector database is public Vector index is a local solution as simple as that so there is a concept called as FIS that we use which is Facebook AI similarity search so using Vector database when we will be storing the research data information here right whenever you are sending this input and you're getting this relevant information right you're basic basically getting the relevant information using this concept called as Facebook AI similarity search FIS many of you might be thinking right a vector database we have the data but how this relevant information is fetching there must be some brain right because open AI is not here there must be somebody who is getting this information right and that somebody is this the difference between Vector database and Vector index is Vector databases are meant for production level uh projects when you want to use huge amount of data let's say you have hundreds of custom PDFs your vector space will be huge right multi-dimensional too much of data in that case Vector index will not work Vector index is only for a local project which is sitting on your own computer that's it Vector indexes are not recommended for production level projects okay so we always should use a vector database Vector database examples are pine cone and uh Amazon AWS also has Vector databases um I think it's uh I forgot the name um if I see I'll remember yeah open search is also there open search is there yeah RDS RDS is there I have not used memory DB but uh I think this is new I think this is Rish memory DB yeah so every now and then new things are popping up right so it's difficult but I have used Pine con a lot I can also show you pine cone um for one of my projects creating a vector index is very simple in Pine Cone uh Vector uh Vector database index langin is just like a um what do you say like I'll I'll explain you when I start the coding I will explain you it's like a connector so we have used so this is a index that we have created in the vector database okay so for one of our demo projects we have created index and you can see that all these are vector data and I have used one of the papers which is rag paper uh I can show you that paper also this is the project which we have on our AI Masters program uh research paper bought so This research paper we are using in Pine Cone and then you can see like line by line uh we are storing the values here and when you are using Vector database the latency will be better right because um local index will not be able to store huge amount of data right so that's when we use pineco Vector database okay so that's it we will jump directly into the project we have 30 minutes uh I I'm not sure if 30 minutes is enough but uh we'll will try our best so this project anyways is coming live on my YouTube channel very soon and for those who are part of the AI program uh I have also released this uh project just last night because I was I C created it last night just for the sake of uh workshop and I have also given uh the course in the the video in the course so in case you can want to watch it there you can watch it there so let me explain you this project this is a very interesting project uh based on the principles of my tool anybody knows about this Zep resume this is a new tool that we have created for job Seekers right now it is free till the end of this year it will be free next year it will be chargeable but U it's not completely chargeable U I don't know about the pricing but you can explore so in this tool there is something called as career coach okay so what is the background logic of this I will explain and then I will go to the project code on how I have created the project code and eventually I will send you the code to you okay so no worries okay um let's go ahead with the project so first of all what is the project all about okay the project is AI career coach now there are few things okay the first thing is that what is the input to this coach and what is what what exactly this coach is going to do so forget about the word AI what who who is a career coach career coach is somebody who guides you right let's say I am I can be one of your career coach right now in order for me to give you some tips what I need from you I need resume right and resume is more than enough right of course if you call me and explain me more about your profile it's good but eventually what you are going to tell me over a conversation should mandatorily be present on resume right because you cannot really go and explain every everything about yourself to recruiters right so you have to pass on all the information on resume so literally resume is most important for a career coach only when you send your resume to me I will be able to give you some inputs right so resume plays a important role and resume is normally a 3 to four page PDF document right and assuming we usually have like 400 words in one page but resumés are having less content right so imagine a resume should not have more than th000 words I mean logically it should not have I can check my own resume if you want uh how many words do we have you can see thousand words right literally thousand words so thousand words are not too much right now what we need to do here chunking is needed well in this case chunking is not needed but make it a habit so always make it a habit to get started with chunking okay so chunking basically divides the data into chunks in this particular example there will be no chunks but there will be just one chunk right and then after chunking is done we will be calling a word embeddings right and then these embeddings are going to be stored I'm talking very specific to this project okay these embeddings are going to be stored in a vector index and I will write it as a local Vector index okay local Vector index means on my system right here and if I delete it it's fine it will be created when you start your application okay so local Vector index you have your data we will have a front end which is like a chatbot okay when when you are invoking the UI you will have to pass your resume right so basically I will change this diagram here so basically you have a front end and you are uploading your resume right and then you're storing it in the vector database now from the vector database in UI you have already defined a prompt okay what will be the prompt The Prompt is you are an AI career coach given the resume give me the summary of my resume in below points education experience blah blah blah right projects skills also provide a resume a score out of 100 based on let's say you want to give this you can also give this but in this project I have not provided uh uh this this thing but it's something like this and then you're basically asking it to your prompt is provided and then based on this prompt it's going to this local index getting all the relevant information plus the input okay input is that you are passing and then you are calling openi and then you're basically giving the output this is the flow Beyond this Beyond this in this project what's what's there is on the front end you also have another chatbot okay or like a ask a question where whenever you are asking a question let's say uh let's say can you rephrase or let's say I want to move into XYZ domain tell me how to do it now this is the question that you have passed okay now this question goes directly to open AI or maybe here okay another flow Starts Here relevant information plus input another flow and then this output goes to the chatbot so it's little bit complicated because there are two processes in this project but overall is it's this okay I I'll explain one second and once I show you the application you will be able to understand so the application is already uh you know on on um AWS I just deployed it yesterday because on local machine I cannot show you on local machine I cannot show you because I um I am staying in Hong Kong right in Hong Kong open is banned that's why I cannot show you on my local machine I I'll potentially show you here okay so this is how your application front end is okay so now you are here this is your UI this is your UI okay first you are uploading your resume okay so now you are in this step let me do it in red now you are here okay now you are uploading your resume and clicking on submit in the background what's happening is chunking is happening word endings is happening and data is stored in Vector local Vector database okay as in this case the application is deployed on AWS the local Vector index is created in in the AWS ec2 instance okay and then you are able to get a final output okay that um he is a seasoned a lead data consultant with 13 years of experience and how am I getting in this format is how I have defined my prompt okay so I have basically defined my prompt in a way that you are a career coach given the resume give me the summary so it's not education experience it's like uh career objectives and then skills and expertise and so on okay I'm not elaborating so this is what I have passed in the code based on which I able to get the response here in this format so now you arew resume is passed chunking is done word embeddings are created local index is created then you got the relevant information along with this this so let's say this is X1 so here you are getting X1 plus all the relevant information so let's say R1 and then you are calling the open Ai and open AI is giving you the output and you are passing it to the front end so this is it now the second flow is ask a question that is the second flow so you have a chatbot ask a question now let's say you are asking a question I want to move into AI architect roles what changes I need in the resume so I will change this question to like this okay now this is my question now what is my input two so let's say this is X2 and for this input and the resume let's say the relevant information is R2 so you're getting X2 + R2 you're calling open a and giving the final output and this is output to transition into AI architect role you should focus on highlighting relevant skills experiences and project that align the responsibilities of an AI see don't don't worry about formatting formatting is something which is a UI skill right it is not a generative AI skill it's a front end skill right and I'm not a front-end person so that's why it's not formatted but yeah if we format it it will look better right but you're able to see right it's answering it is mentioning that highlight your relevant skills emphasize skills related to AI architecture system design Cloud architecture integration Cloud platforms AI Frameworks tools like tensorflow P TCH emphasize your leadership qualities so it's actually giving a good response let's say I want to ask another question now what will happen you're basically invoking again this one and now let's say you ask another question that will be X3 right so you will get another answer which will be X3 + R3 and you will get the output right as simple as that so can you refr let's say I want to move so this window could have been more better but yeah as I told you I'm not a pro in front end skills so with whatever I could I I created this again expertise in powerb mastering its Advanced functionalities and data visualization and develop leadership right so something like this you can always test it out and TW the code no worries but architecture wise everything is good if yes I'll go into the code and I will do a code work through so far so good enjoying the project the first case The Prompt was fixed sorry like in the first case The Prompt is fixed you fix yeah this is a fixed prompt yes this is a fixed prompt that you have already written in the code I will uh show you that and this is a prompt that you're passing from the front end okay so let me quickly open it in the uh in my spider and then I'll show you okay so these are all Imports okay now Lang Chen as I told you Lang Chen is nothing but it's it's like a connector okay and it's it's like it connects the language model with the code python code as simple as that and using Lang Chen Library you can actually use the langen so langen is a library which provides a lot of things it provides the connector to the open a it provides the chain on how you are building the pipeline it provides the prompt template it provides the embeddings a lot of things it's like a framework okay so first line is very simple we are importing flask second is OS Library which is for working on operating systems uh this one I I think it's for yeah secure file name while working on file names to make it secure Pi PDF 2 is for parsing the PDFs and Lang chain is a framework for doing everything with respect to the language model okay the first piece of code is very simple I'm calling a character text splitter and this is where I'm defining the chunk size you remember right I told you chunk size and chunk overlap this is the same thing chunk size and chunk overlap so I'm creating the text splitter so whenever you will be passing the passing the document you will be first reading the file right and then you will be doing the splitter and this is the chunking part okay then I have defined my embeddings so here we have studied about what to Vector about bag of words there are many hugging face embeddings also so here I'm using a hugging face embedding from Lang chain embeddings okay if you go to Lang chain embeddings there are hundreds and hundreds of embeddings there are many embedding models okay so you can potentially check all of these embedding models you can see Vortex AI embedding is there a your open AI embeddings are there hugging pH embeddings are there multiple embeddings are there okay going back so here this is the part where uh we are kind of creating the vector index okay I'm calling the F Library what is fis fis is Facebook AI similarity search it's a library for efficient similarity search and clustering of dense vectors okay if you go back to the um where if I go back to the mirror board right here the fetching part the relevant information part that is done by the FIS so whenever you create use your data first you create a index from that index you're basically calling a retriever and based on a similarity search algorithm you are calling the retriever right and you are creating a retrieval qway from chain type here you're passing the llm you are passing the chain type you are passing the retriever and whatever response you are getting you are get you are storing it in result so this is a method that we have created here I'm initiating the flask application I have mentioned my upload folder which is uploads so whatever you upload will be stored here okay if you upload 10 resumés in 10 trials this folder will have those files okay so here is the extraction part extract text from PDF here I've have defined the chat GPT this thing sorry chat open a part model I'm using 40 temperature is zero for no variations this is where you need to add API key you can take take it from the front end uh open AI platform and this is the fixed template that I'm using okay this is the exact same thing that I explained you right so this part this is it this is my summary template now here I'm calling a prom template I'm passing the summary template now if I go back to the flask application these are flask routing techniques for those who does not understand flask uh there is no way I can explain you right now because as I told you python is needed right even if you don't know flask you will still be able to understand um there is a flask end to end video on my channel uh it's a 30 minutes video just search flask satjit or something you will get it uh you can go through that to understand more about flask and then you will be able to understand this code but if you don't understand python I can I cannot help python is mandatory right so these are routing techniques of flask which basic basically means when the flask application will be turned on the first HTML that will be rendered is index.html which is nothing but this one so when the application is turned on we will be able to see this page when you are clicking on submit so let's say I'm going to the homepage when you're clicking on submit right this is where the the upload button is clicked post method here you are basically redirecting the URL for index and then you are reading the file here you are reading the file in the secure thing and then uploading it in the upload folder here you are calling the extract text function where you are using the P PDF 2 library to read the file and after that you are calling the text splitter where you are splitting the file and you can see line by line everything is mentioned right first extract text then text splitter then you are calling the FIS from text then this is where you are storing the vector vector index the local Vector index that you are storing okay and then you are calling the rume analysis chain which is nothing but the llm chain where you are calling the llm and you have passed the resume prompt which is nothing but your summary prompt right it's very easy if you back Trace you will be able to understand everything and then after that you are rendering the results. HTML what is results. HTML it is this one your resume analysis results right the analysis results that you saw initially that page slash ask ask is nothing but this one when you are asking a question so when you are clicking on this basically it is redirecting it to slash ask okay and when this is called then you are basically calling this function perform QA and in this perform QA you are basically calling the large language model to get the most relevant information so there is no internal prompt here but the prompt whatever you are passing from the front end you are basically Al getting the front end prompt right query this is the same thing that you are passing from the front end okay and then you are passing it and then you are invoking the retrieval QA and you're getting the result that's it and then you're starting the main function so how to start this application is very simple when I pass you this code to you simply what you need to do is first go to that folder okay first install these requirements so to install requirements I think everybody should be able to know pip install right I will not run this or else individually you can install and then simply just go to python app.py that's it so it will take some time for the very first time and within couple of seconds your application will be ready done application is ready it is running on this okay so we will go and we will open Local Host blah blah blah oh uh somehow it has restarted anyways it will be on very soon right is taking so much time FIS is um the similarity algorithm to get the relevant information from the vector database okay so the application is up and I cannot show how it works here because it will fail because I'm I'm in Hong Kong and then even uh okay this is where you need to pass your API key okay make sure you pass the key here where is this where is this where is this uh yeah this part I have not passed my key right uh so that's why whatever I do it will fail it will fail with an error that key is not not right or something I don't know but this application is running I have already deployed on AWS ec2 I just showed you right it's working so it's throwing me this error because I'm in Hong Kong but yeah for you it it will work my loyed application is working so okay I think that's all about it um I'll keep the call open for few more minutes uh in case you have any questions you can ask me where to get API key chat uh open open AI platform okay um that's all about this entire Workshop it was is quick but I believe lot of information was passed on to you uh now you know what should be your next things to learn um of course Mastery in NLP and deep learning is not needed but I was not able to explain you the internal architecture of Transformers because the internal architecture of Transformers and B is eventually a neural network and you don't know machine learning right right so it will be difficult if I just go and explain it you will not be able to understand so that's why to become a developer you know same thing machine learning and deep learning is needed but not to that extent even if you don't understand ML and DL you'll still be able to create a generative AI project like this right because here you are not actually creating a neural networks neural networks is happening in the background you don't you don't need to understand Transformers internal architecture to work on this project right but it's always good if you learn those things so yeah that that's all about it I mean people who are already a part of my AI program you can always check in-depth videos are already provided in that program Transformers NLP explained in a very uh layman's term but for others there is no mandatory but yeah in case you want to explore my program you can reach out to me I will help you out can you recommend book on ML I don't recommend any books sorry because I'm myself I don't read books so I I read blogs and I I read research papers I don't read books so I'm not the right person to tell tips for getting a job for AI engineer or gen or scientist are similar similar role um sorry it's not sashank it's satjit but anyways tips for getting a job for AI engineering role um I mean as a fresher or as an experienced professional see as a fresher you should be open on everything right the more openess you have the more chances you get uh a job and portfolio project is very important for freshers very very important like I don't even spend more than two seconds in going through the resumés whenever I hire an inter or even for a pressure opportunity 2 seconds that's it and if you if I see some random easy projects it doesn't attract me for experienced professionals background is very important from which background you are coming from because you will be eventually using your experience to migrate into this data science and AI area so the most important thing is go and talk to your in case you are completely isolated you don't know uh what your data AI team does then go start build relationships with them talk to them understand what are the business problems that they are solving right and then try to be picky on the projects because you already know what your company is solving right after that start picking the projects wisely and then start thinking what type of projects are important especially for your background and then work on those projects as simple as that but learning wise I think for data science and AI the domain is not that simple right now uh because there are a lot of requirements uh lot of expectations from companies 5 six years back it was very easy when I transitioned honestly speaking 8 years back just machine learning was enough I did not knew deep learning I learned deep learning I learned few other things on the go but now if somebody who only knows machine learning goes for an interview they will be bashed right because expectations are huge right now so you should be aware of everything and in order to learn everything starting from python stat SQL ml DL M machine learning sorry NLP Transformers deep learning computer vision there is good amount of time that you need to invest okay so agents agents are uh something like they help you in automations AI agents it's like you want to automate some processes in your company right for example uh your company wants to take uh let's say I was giving the Bloomberg example right so in that project we are actually using agents that's a real project actually in in one of my client companies we are um they are basically using Bloomberg so the project is something like this so they have Bloomberg uh Bloomberg is basically a tool like a chat tool where their agents are talking okay like customer service imagine customer service and then from this interpretation they basically get to know few details like the product details what is the product they are talking about what is the customer sentiment with whether he's interested or not what is the cost of the product something like that and the project is we have to automate this process to store data from the chat in a structured Excel format and eventually push it to a database so here we have used generative AI so after we extracted that we have automated the flows for chat extraction and what we have done is the chats are automatically extracted using agents and then we are calling imagine something is here where we have the GPT architecture and here we have you know it's it's basically doing all the work extraction of structure uh unstructured to structure data and all those things formatting and then it is basically sending it to Excel and this is happening on a you know on a daily basis so agents are basically you know automations that's it okay what are the best practices of best projects for fresher to secure data scientist and ml I mean these days nobody's using ml trust me uh you need to have projects especi ml is like ml is now a part of data analytics to be very honest like in data analytics companies are expecting you to have Predictive Analytics knowledge so um ml projects if you don't write ml projects better go for NLP projects or maybe gen projects so maybe some uh projects where you can mention Bert maybe try out Bert Albert different models see which one is working for text summarization or translation and pick that project and then write it down generative AI try to open up your mind explore some of the use cases uh there are many many generative AI project ideas like AI Insurance claim prediction you know this project that I told you so you have to do your own research and start working on projects but it it should be like a rack project rack pipeline project and then write it down ml projects not recommended uh suit it's a process that it's a process uh you cannot really there is no platform to learn about AI agents like it's just a concept right so it's like just a automation uh concept to do some tasks so there is no such tool to learn uh eventually uh once you start working on non agent uh ways like proof of Concepts and then eventually you will understand more about AI agents okay that's all about it uh thank you all for being a part of this session um I hope it was fruitful um in case you have any other questions let me know in the WhatsApp group we will plan to come up with some other uh workshops as well uh especially for people in this group as you are already aware of certain extent maybe we'll have like another Workshop only specific to projects where I can go deep into uh Lama architectures and uh mistal architectures and probably take a project and compare different models which will be like the second version of this Workshop but I need to still think what can I do but that will be only CED to you guys that's it not for everyone because else everyone will be like oh I don't understand this why this guy is starting explaining this thing right only for this group we plan but only again the prerequisite is only if we have 10 15 students right else it's it's a waste of time for me and yeah I'm planning to have other sessions as well in other areas like I'm planning to have a session on AWS tools uh on Azor tools to understand more about uh Azor AI Services these kind of things I'll probably send you some information once we finalize but yeah thank you thank you for for your time time I hope you enjoyed it there will be an anonymous uh um survey sent to you names will not be provided uh just give a honest feedback uh on how I could have improvised and do not blindly give negative comments think about all the aspects the time was short uh monetary aspects also it was not a huge ask to think about everything and give genuine feedbacks um so yeah I I'll send you the feedback link uh in in in the group okay yeah yeah for sure karik two weeks definitely not even two weeks like it will be more than 3 to four weeks because I'm traveling next week uh and it's Christmas right so uh December end not possible maybe January mid January or something okay thank you all please practice as much as possible the code project will be provided to you uh the project codes will be provided to you as part of uh the the GitHub link um that's it any questions feel free to ask me on the group thank you bye-bye
Subscribe to:
Comments (Atom)