<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://srikarkashyap.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://srikarkashyap.github.io/" rel="alternate" type="text/html" /><updated>2026-04-19T19:30:56-04:00</updated><id>https://srikarkashyap.github.io/feed.xml</id><title type="html">Srikar Kashyap Pulipaka</title><subtitle>Software Engineering, Machine Learning Systems and more</subtitle><author><name>Srikar Kashyap Pulipaka</name><email>srikar.kashyap@gmail.com</email></author><entry><title type="html">How we won the PAN 2024 Text Classification challenge</title><link href="https://srikarkashyap.github.io/how-we-won-PAN2024/" rel="alternate" type="text/html" title="How we won the PAN 2024 Text Classification challenge" /><published>2024-07-31T00:00:00-04:00</published><updated>2024-07-31T00:00:00-04:00</updated><id>https://srikarkashyap.github.io/how-we-won-PAN2024</id><content type="html" xml:base="https://srikarkashyap.github.io/how-we-won-PAN2024/"><![CDATA[<h2 id="background">Background</h2>

<h2 id="the-competition">The Competition</h2>

<h2 id="research-process">Research Process</h2>

<h2 id="results">Results</h2>

<h2 id="outro">Outro</h2>]]></content><author><name>Srikar Kashyap Pulipaka</name><email>srikar.kashyap@gmail.com</email></author><summary type="html"><![CDATA[Background]]></summary></entry><entry><title type="html">Getting started with Selenium Webdriver and Requests in Python</title><link href="https://srikarkashyap.github.io/posts/2024/06/web-scraping-selenium/" rel="alternate" type="text/html" title="Getting started with Selenium Webdriver and Requests in Python" /><published>2024-06-13T00:00:00-04:00</published><updated>2024-06-13T00:00:00-04:00</updated><id>https://srikarkashyap.github.io/posts/2024/06/selenium_tutorial</id><content type="html" xml:base="https://srikarkashyap.github.io/posts/2024/06/web-scraping-selenium/"><![CDATA[<p>In this short tutorial, let’s look at the US Patent and Trademark Office (USPTO) website and scrape the patent database using a keyword search. We will use Selenium WebDriver to scrape the data. We will then use the Requests library to download the individual patent PDF documents.</p>

<p><small>Compiled by Srikar Kashyap Pulipaka</small></p>

<p><small>Last Updated: 13 June 2024</small></p>

<h1 id="part-1-scraping-the-uspto-website-for-patent-data">Part 1: Scraping the USPTO website for Patent Data</h1>

<h3 id="what-is-selenium-webdriver">What is Selenium WebDriver?</h3>

<p>Selenium Webdriver is a browser simulation framework that allows you to interact with a web page using a real, fully-featured browser. It is primarily used for automating web applications for testing purposes, but it can also be used for web scraping. It is one of the many components of the Selenium Test Suite.</p>

<h3 id="installation-and-setup">Installation and Setup</h3>

<p>First, you need to install the Selenium WebDriver and the Chrome WebDriver. You can install the Selenium WebDriver using the following command:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install </span>selenium
</code></pre></div></div>

<p>You can download the Chrome WebDriver from the following link: <a href="https://googlechromelabs.github.io/chrome-for-testing/">Chrome WebDriver</a></p>

<p><strong>Note:</strong> The Chrome WebDriver should be <em>the same</em> as the version of Chrome installed on your system. You can check the version of Chrome by going to <code class="language-plaintext highlighter-rouge">chrome://settings/help</code>.</p>

<p>Once you download the Chrome WebDriver, extract the file and place it in the same directory as your Python script.</p>

<h3 id="importing-the-necessary-libraries">Importing the necessary libraries</h3>

<p>We start off with importing the necessary libraries. We will be using the Selenium WebDriver to scrape the data and the Requests library to download the PDF documents.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">selenium</span> <span class="kn">import</span> <span class="n">webdriver</span>
<span class="kn">from</span> <span class="nn">selenium.webdriver.common.keys</span> <span class="kn">import</span> <span class="n">Keys</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
</code></pre></div></div>

<h3 id="keyword-definition">Keyword Definition</h3>

<p>Let’s define the keyword to be used to search the USPTO database. In this case, we will use the keyword “semiconductor”.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">keyword</span> <span class="o">=</span> <span class="s">"semiconductor"</span>
</code></pre></div></div>

<h3 id="initializing-the-webdriver-and-navigating-to-the-uspto-website">Initializing the WebDriver and Navigating to the USPTO Website</h3>

<p>We will initialize the WebDriver and navigate to the USPTO website. We will then search for the keyword “semiconductor” in the search bar, and press the search/enter button.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">driver</span> <span class="o">=</span> <span class="n">webdriver</span><span class="p">.</span><span class="n">Chrome</span><span class="p">()</span>
<span class="n">driver</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"https://ppubs.uspto.gov/pubwebapp/static/pages/ppubsbasic.html"</span><span class="p">)</span>
<span class="n">driver</span><span class="p">.</span><span class="n">find_element</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="s">"searchText1"</span><span class="p">).</span><span class="n">send_keys</span><span class="p">(</span><span class="s">"semiconductor"</span><span class="p">)</span>
<span class="n">driver</span><span class="p">.</span><span class="n">find_element</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="s">"searchText1"</span><span class="p">).</span><span class="n">send_keys</span><span class="p">(</span><span class="n">Keys</span><span class="p">.</span><span class="n">RETURN</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="scraping-and-saving-the-data">Scraping and Saving the Data</h3>

<p>We will scrape the data from the search results and save it into a Pandas Dataframe. We will then save the Dataframe into a CSV file. For this tutorial, we will run this script for the first 5 pages of the search results. You can run eventually run it for all the pages in its current form by removing the count condition.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">master_df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">()</span>
<span class="n">count</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
    <span class="n">time</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
    <span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_html</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">driver</span><span class="p">.</span><span class="n">page_source</span><span class="p">))[</span><span class="mi">0</span><span class="p">]</span>
    <span class="n">master_df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">concat</span><span class="p">([</span><span class="n">master_df</span><span class="p">,</span> <span class="n">df</span><span class="p">])</span>
    <span class="n">time</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
    <span class="k">try</span><span class="p">:</span>
        <span class="n">driver</span><span class="p">.</span><span class="n">find_element</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="s">"paginationNextItem"</span><span class="p">).</span><span class="n">click</span><span class="p">()</span>
    <span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
        <span class="k">print</span><span class="p">(</span><span class="n">e</span><span class="p">)</span>
        <span class="k">break</span>
    <span class="k">print</span><span class="p">(</span><span class="s">'Size of collection so far:'</span><span class="p">,</span> <span class="n">master_df</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
    <span class="n">count</span> <span class="o">+=</span> <span class="mi">1</span>
    <span class="k">if</span> <span class="n">count</span> <span class="o">&gt;</span> <span class="mi">5</span><span class="p">:</span>
        <span class="k">break</span>
<span class="n">master_df</span><span class="p">.</span><span class="n">reset_index</span><span class="p">(</span><span class="n">drop</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">master_df</span><span class="p">.</span><span class="n">to_csv</span><span class="p">(</span><span class="s">'patents.csv'</span><span class="p">,</span> <span class="n">index</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
</code></pre></div></div>

<p>Let’s look at what happens in this piece of code:</p>

<ol>
  <li>We initialize a master dataframe using the Pandas library. This dataframe will store the data from all the pages.</li>
  <li>We also instantiate a count variable to keep track of the number of pages we have scraped.</li>
  <li>We start a while loop that will run until the count reaches 5. This loop will scrape the data from each page and save it into the master dataframe.</li>
  <li>We <em>sleep</em> for 3 seconds after the page loads to give it enough time to load the javascript elements. Please adjust this time depending on your internet speed and the complexity of the website.</li>
  <li>We then provide the webpage source code as the input for the Pandas Read HTML function. This function will scrape the data from the webpage and save it into a Pandas Dataframe. It just reads the list of tables from the HTML source code. The required table is the first table in the list of tables. So we select the first element of the list.</li>
  <li>We concat this new dataframe with the master dataframe.</li>
  <li>We check if the element for the next page is present in the current page. If it is present, we click on the next page button. If it is not present, we break out of the loop. This is how we know that we have reached the end of the search results.</li>
  <li>We increment the count variable by 1.</li>
  <li>If the value of count is 5, we break out of the loop (Please comment this line if you want to scrape all the pages).</li>
  <li>We reset the index of the master dataframe as the index will be duplicated after each concatenation.</li>
  <li>We save the master dataframe into a CSV file.</li>
</ol>

<h3 id="final-code-run-only-this-code">Final Code (Run Only This Code)</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># import the required libraries
</span><span class="kn">from</span> <span class="nn">selenium</span> <span class="kn">import</span> <span class="n">webdriver</span>
<span class="kn">from</span> <span class="nn">selenium.webdriver.common.keys</span> <span class="kn">import</span> <span class="n">Keys</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>

<span class="c1"># set the keyword
</span><span class="n">keyword</span> <span class="o">=</span> <span class="s">"semiconductor"</span>

<span class="c1"># open the browser and navigate to the website. Enter the keyword and click on search
</span><span class="n">driver</span> <span class="o">=</span> <span class="n">webdriver</span><span class="p">.</span><span class="n">Chrome</span><span class="p">()</span>
<span class="n">driver</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"https://ppubs.uspto.gov/pubwebapp/static/pages/ppubsbasic.html"</span><span class="p">)</span>
<span class="n">driver</span><span class="p">.</span><span class="n">find_element</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="s">"searchText1"</span><span class="p">).</span><span class="n">send_keys</span><span class="p">(</span><span class="s">"semiconductor"</span><span class="p">)</span>
<span class="n">driver</span><span class="p">.</span><span class="n">find_element</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="s">"searchText1"</span><span class="p">).</span><span class="n">send_keys</span><span class="p">(</span><span class="n">Keys</span><span class="p">.</span><span class="n">RETURN</span><span class="p">)</span>

<span class="c1"># create an empty dataframe to store the data
</span><span class="n">master_df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">()</span>

<span class="c1"># loop through the pages and extract the data
</span><span class="n">count</span> <span class="o">=</span> <span class="mi">0</span>

<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
    <span class="c1"># wait for the page to load
</span>    <span class="n">time</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
    <span class="c1"># read the data from the page and append it to the master
</span>    <span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_html</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">driver</span><span class="p">.</span><span class="n">page_source</span><span class="p">))[</span><span class="mi">0</span><span class="p">]</span>
    <span class="n">master_df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">concat</span><span class="p">([</span><span class="n">master_df</span><span class="p">,</span> <span class="n">df</span><span class="p">])</span>
    <span class="n">time</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
    <span class="k">try</span><span class="p">:</span>
        <span class="n">driver</span><span class="p">.</span><span class="n">find_element</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="s">"paginationNextItem"</span><span class="p">).</span><span class="n">click</span><span class="p">()</span>
    <span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
        <span class="k">print</span><span class="p">(</span><span class="n">e</span><span class="p">)</span>
        <span class="k">break</span>
    <span class="k">print</span><span class="p">(</span><span class="s">'Size of collection so far:'</span><span class="p">,</span> <span class="n">master_df</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
    <span class="n">count</span> <span class="o">+=</span> <span class="mi">1</span>
    <span class="k">if</span> <span class="n">count</span> <span class="o">&gt;</span> <span class="mi">5</span><span class="p">:</span>
        <span class="k">break</span>
<span class="n">driver</span><span class="p">.</span><span class="n">close</span><span class="p">()</span>
<span class="n">master_df</span><span class="p">.</span><span class="n">reset_index</span><span class="p">(</span><span class="n">drop</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">master_df</span><span class="p">.</span><span class="n">to_csv</span><span class="p">(</span><span class="s">'patents.csv'</span><span class="p">,</span> <span class="n">index</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Size of collection so far: 50
Size of collection so far: 100
Size of collection so far: 150
Size of collection so far: 200
Size of collection so far: 250
Size of collection so far: 300
</code></pre></div></div>

<p>Let’s have a look at the data we collected.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">master_df</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>

<div>
<style scoped="">
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Result #</th>
      <th>Document/Patent number</th>
      <th>Display</th>
      <th>Title</th>
      <th>Inventor name</th>
      <th>Publication date</th>
      <th>Pages</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>US-20240193696-A1</td>
      <td>Preview PDF</td>
      <td>PROACTIVE WEATHER EVENT COMMUNICATION SYSTEM A...</td>
      <td>Wyatt; Amber et al.</td>
      <td>2024-06-13</td>
      <td>21</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>US-20240188889-A1</td>
      <td>Preview PDF</td>
      <td>FLASH LED AND HEART RATE MONITOR LED INTEGRATI...</td>
      <td>Tankiewicz; Szymon Michal et al.</td>
      <td>2024-06-13</td>
      <td>28</td>
    </tr>
    <tr>
      <th>2</th>
      <td>3</td>
      <td>US-20240193523-A1</td>
      <td>Preview PDF</td>
      <td>VIRTUAL CAREER MENTOR THAT CONSIDERS SKILLS AN...</td>
      <td>O'Donncha; Fearghal et al.</td>
      <td>2024-06-13</td>
      <td>16</td>
    </tr>
    <tr>
      <th>3</th>
      <td>4</td>
      <td>US-20240193519-A1</td>
      <td>Preview PDF</td>
      <td>SYSTEMS AND METHODS FOR SYSTEM-WIDE GRANULAR A...</td>
      <td>Holovacs; Jeremy</td>
      <td>2024-06-13</td>
      <td>34</td>
    </tr>
    <tr>
      <th>4</th>
      <td>5</td>
      <td>US-20240190459-A1</td>
      <td>Preview PDF</td>
      <td>METHODS AND SYSTEMS FOR VEHICLE CONTROL UNDER ...</td>
      <td>Mamchuk; Tetyana V. et al.</td>
      <td>2024-06-13</td>
      <td>23</td>
    </tr>
  </tbody>
</table>
</div>

<h1 id="part-2-downloading-the-pdf-patent-documents-using-the-requests-library">Part 2: Downloading the PDF Patent Documents using the Requests Library</h1>

<p>Now that we have the patents data, we can use the following code to extract/download the patents in PDF format. We will be using the Requests library to download the PDF documents using simple HTTP requests.</p>

<p>It turns out (to our advantage) that the USPTO websites stores the PDF patent documents in a predictable URL format. We can use this to download the PDFs of the patents we are interested in.
The URL is of the format https://ppubs.uspto.gov/dirsearch-public/print/downloadPdf/XXXXXXXX.pdf where XXXXXXXX is the patent number. We can use the requests library to download the PDFs.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">requests</span>

<span class="n">url</span> <span class="o">=</span> <span class="s">"https://ppubs.uspto.gov/dirsearch-public/print/downloadPdf/{}"</span>

<span class="c1"># sample 4 rows
</span><span class="n">sample</span> <span class="o">=</span> <span class="n">master_df</span><span class="p">.</span><span class="n">head</span><span class="p">(</span><span class="mi">4</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">sample</span><span class="p">.</span><span class="n">iterrows</span><span class="p">():</span>
    <span class="n">patent_number</span> <span class="o">=</span> <span class="n">row</span><span class="p">[</span><span class="s">"Document/Patent number"</span><span class="p">].</span><span class="n">split</span><span class="p">(</span><span class="s">"-"</span><span class="p">)[</span><span class="mi">1</span><span class="p">]</span>
    <span class="n">formatted_url</span> <span class="o">=</span> <span class="n">url</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">patent_number</span><span class="p">)</span>
    <span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">formatted_url</span><span class="p">)</span>
    <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">row</span><span class="p">[</span><span class="s">'Document/Patent number'</span><span class="p">]</span><span class="si">}</span><span class="s">.pdf"</span><span class="p">,</span> <span class="s">"wb"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
        <span class="n">f</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="n">response</span><span class="p">.</span><span class="n">content</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Downloaded </span><span class="si">{</span><span class="n">row</span><span class="p">[</span><span class="s">'Document/Patent number'</span><span class="p">]</span><span class="si">}</span><span class="s">.pdf"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Downloaded US-20240193696-A1.pdf
Downloaded US-20240188889-A1.pdf
Downloaded US-20240193523-A1.pdf
Downloaded US-20240193519-A1.pdf
</code></pre></div></div>

<p>Let’s break down the code:</p>

<ol>
  <li>We first import the necessary library: requests.</li>
  <li>We sample 4 rows from the master_df DataFrame (first four rows).</li>
  <li>We iterate over each row in the sample DataFrame.</li>
  <li>We extract the patent number from the “Document/Patent number” column. Since the patent number is of the format “US-XXXXX-XX”, we split the text using the “-“ character and extract the second part.</li>
  <li>We format the URL using the extracted patent number.</li>
  <li>We make a GET request to the formatted URL.</li>
  <li>We write the content of the response to a PDF file with the name of the patent number. We load the file in write-binary mode (“wb”) as it is a PDF file.</li>
  <li>We print a message indicating that the PDF file has been downloaded.</li>
</ol>

<h3 id="selenium-webdriver-vs-requests-library">Selenium Webdriver vs. Requests Library</h3>

<p>Some personal notes on the choice between Selenium WebDriver and the Requests library for web scraping:</p>

<p><strong>When to use Selenium WebDriver</strong></p>
<ul>
  <li>When the website is dynamic and requires javascript to load the content.</li>
  <li>When the website requires user interaction (e.g., clicking buttons, filling forms).</li>
</ul>

<p><strong>When to use the Requests library</strong></p>
<ul>
  <li>When the required elements are present in the source code of the webpage on load. In this case, you can directly scrape the data using the Requests library.</li>
  <li>When the website is static and does not require javascript to load the content.</li>
  <li>When the website does not require user interaction.</li>
</ul>

<p><strong>Tip (that usually works):</strong> If you can find the required data/element in the source code after clicking Control+U, you can directly scrape the data using the Requests library. If you cannot find the required data/element in the source code, you might need to use the Selenium WebDriver.</p>]]></content><author><name>Srikar Kashyap Pulipaka</name><email>srikar.kashyap@gmail.com</email></author><category term="python" /><category term="tutorial" /><category term="web-scraping" /><summary type="html"><![CDATA[In this short tutorial, let’s look at the US Patent and Trademark Office (USPTO) website and scrape the patent database using a keyword search. We will use Selenium WebDriver to scrape the data. We will then use the Requests library to download the individual patent PDF documents.]]></summary></entry><entry><title type="html">Problems and Nuisances</title><link href="https://srikarkashyap.github.io/posts/2021/01/problems-and-nuisances/" rel="alternate" type="text/html" title="Problems and Nuisances" /><published>2021-01-15T00:00:00-05:00</published><updated>2021-01-15T00:00:00-05:00</updated><id>https://srikarkashyap.github.io/posts/2021/01/problems-and-nuisances</id><content type="html" xml:base="https://srikarkashyap.github.io/posts/2021/01/problems-and-nuisances/"><![CDATA[<p><em>Identifying the user issues worth solving is very crucial but often ignored</em></p>

<p>In the course of my study on product management, I’ve identified two categories of user issues: <strong>Problems</strong> and <strong>Nuisances</strong>.</p>

<p><strong>Problems</strong> are difficulties that affect the user so much, they are willing to spend time and money to solve them.</p>

<p><strong>Nuisances</strong> feel like problems at the moment but are quickly ignored or forgotten.</p>

<h2 id="lets-take-an-example">Let’s take an example</h2>

<p>My mobile hangs almost once every month and needs a restart. If somebody asks me about this, I would rave for half an hour about how terrible my phone is. If they ask me if I would buy their <em>hang-free phone</em>, I would instinctively say yes!</p>

<p>But will I? No! Because it’s a nuisance that seems important in that moment. I’m totally fine with my phone the rest of time.</p>

<p><strong>This misidentification of nuisances as problems seems to plague non-tech companies the most</strong>. A laundry list of features is made and the development outsourced to a vendor, with negligible or zero user surveys or demand identification. Most of these <em>features</em> are often obscure and unused.</p>

<p>But don’t throw away the nuisances yet! <strong>In many cases, nuisances solved with 10X efficiency have a potential niche market for themselves</strong>. These markets rarely overlap with the original markets.</p>

<p>A clock that slows down every month by five minutes? Nuisance. A clock that doesn’t slow down for decades or centuries? <strong>Atomic clock</strong> (with important scientific applications).</p>

<p>Can you think of any more examples that prove (or disprove) this distinction?</p>

<p><strong>TL;DR:</strong> Solve problems. Unless you can solve the nuisance 10X better. Then do that!</p>

<h2 id="recommended-reading-on-identifying-problems">Recommended reading on identifying problems</h2>
<ul>
  <li>Talking to Humans</li>
  <li>The Mom Test</li>
</ul>]]></content><author><name>Srikar Kashyap Pulipaka</name><email>srikar.kashyap@gmail.com</email></author><category term="product-management" /><category term="thoughts" /><category term="non-tech" /><summary type="html"><![CDATA[Identifying the user issues worth solving is very crucial but often ignored]]></summary></entry></feed>