Imagine you have a page with some text and some links, and only some of the links are RSS feeds.
But you want those RSS feeds for a newsreader of some kind.
Just copy the whole page and feed it to an LLM with the right instructions.
In this case I do this with Sigodens’ Aichat and Cerebras’
llama-4 model:
p | aichat --model cer:llama-4-scout-17b-16e-instruct just provide the rss url and feed urls and nothing else provide one url per line
p is just a command that outputs whatever I copied into
my system clipboard.
Here are more examples of similar approaches with other command line LLM tools:
Using OpenAI’s GPT with fabric:
p | fabric --pattern extract_rss_feeds
Using Ollama with a local model:
p | ollama run llama3.2:3b "Extract only RSS feed URLs from this content. Output one URL per line with no other text."
Let’s return to the initial example. Maybe we want some kind of validation that the urls returned by the LLM are actually RSS feeds.
With Nushell that is easy to do:
p | aichat --model cer:llama-4-scout-17b-16e-instruct "extract rss urls only" | lines | where $it =~ 'feed|rss|xml'
Now, this validation is not perfect, much less a guarantee that the feeds are actually functional. That would take more code to verify - but it could certainly be done in Nushell rather fast.
There are plenty of other things LLM’s can easily find and present for you that would otherwise be time consuming when buried in plenty of surrounding text.
Let’s have a look at a few more examples:
Extracting Email Addresses
Suppose we have some pages with a lot of text and a few email addresses scattered around:
Feeding the content of these pages to an LLM:
p | aichat --model cer:llama-4-scout-17b-16e-instruct extract email addresses one per line
Would output something like:
support@example.com
admin@example.net
team@example.io
These result are even easier to do simple validation on
p | aichat --model cer:llama-4-scout-17b-16e-instruct extract email addresses one per line | lines | where $it =~ '@'
Extracting Names
LLM’s are very useful for extracting data like names since they kinda understand what names are.
Suppose we have a page with some text and names:
Our team consists of John Smith, Jane Doe, and Bob Johnson.
You can reach out to John or Jane for more information.
Feeding this to an LLM:
p | aichat --model cer:llama-4-scout-17b-16e-instruct extract names one per line
Output:
John Smith
Jane Doe
Bob Johnson
John
Jane
One last example, that anyone wanting to work with a large corpus af text might find useful, is removing stopwords.
Removing Stopwords
Suppose we have a page with some text and we want to remove common stopwords like “the”, “and”, etc.:
This is an example sentence with common words like the and a.
Feeding this to an LLM:
p | aichat --model cer:llama-4-scout-17b-16e-instruct remove stopwords
Output:
example sentence common words like
There are certainly more efficient ways to solve these problems, not least the stopwords example.
But LLM’s make it easy for everyone to clean data and extract desired information because the prompts can be given in natural day-to-day language.
And Nushell makes it easy to validate this extracted data and continue working with it right from the command line.