Pyspark command not recognised

2024/9/17 3:57:37

I have anaconda installed and also I have downloaded Spark 1.6.2. I am using the following instructions from this answer to configure spark for Jupyter enter link description here

I have downloaded and unzipped the spark directory as

~/spark

Now when I cd into this directory and into bin I see the following

SFOM00618927A:spark $ cd bin
SFOM00618927A:bin $ ls
beeline         pyspark         run-example.cmd     spark-class2.cmd    spark-sql       sparkR
beeline.cmd     pyspark.cmd     run-example2.cmd    spark-shell     spark-submit        sparkR.cmd
load-spark-env.cmd  pyspark2.cmd        spark-class     spark-shell.cmd     spark-submit.cmd    sparkR2.cmd
load-spark-env.sh   run-example     spark-class.cmd     spark-shell2.cmd    spark-submit2.cmd

I have also added the environment variables as mentioned in the above answer to my .bash_profile and .profile

Now in the spark/bin directory first thing I want to check is if pyspark command works on shell first.

So I do this after doing cd spark/bin

SFOM00618927A:bin $ pyspark
-bash: pyspark: command not found

As per the answer after following all the steps I can just do

pyspark 

in terminal in any directory and it should start a jupyter notebook with spark engine. But even the pyspark within the shell is not working forget about making it run on juypter notebook

Please advise what is going wrong here.

Edit:

I did

open .profile 

at home directory and this is what is stored in the path.

export PATH=/Users/854319/anaconda/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/Users/854319/spark/bin
export PYSPARK_DRIVER_PYTHON=ipython
export PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark
Answer

1- You need to set JAVA_HOME and spark paths for the shell to find them. After setting them in your .profile you may want to

source ~/.profile

to activate the setting in the current session. From your comment I can see you're already having the JAVA_HOME issue.

Note if you have .bash_profile or .bash_login, .profile will not work as described here

2- When you are in spark/bin you need to run

./pyspark

to tell the shell that the target is in the current folder.

https://en.xdnf.cn/q/73115.html

Related Q&A

Sliding window in Python for GLCM calculation

I am trying to do texture analysis in a satellite imagery using GLCM algorithm. The scikit-image documentation is very helpful on that but for GLCM calculation we need a window size looping over the im…

Keras multi-label image classification with F1-score

I am working on a multi-label image classification problem with the evaluation being conducted in terms of F1-score between system predicted and ground truth labels.Given that, should I use loss="…

Thread safe locale techniques

Were currently writing a web application based on a threaded python web server framework (cherrypy) and would like to simultaneously support users from multiple locales.The locale module doesnt appear …

How to draw image from raw bytes using ReportLab?

All the examples I encounter in the internet is loading the image from url (either locally or in the web). What I want is to draw the image directly to the pdf from raw bytes.UPDATE:@georgexsh Here is …

How to speed up numpy code

I have the following code. In principle it takes 2^6 * 1000 = 64000 iterations which is quite a small number. However it takes 9s on my computer and I would like to run it for n = 15 at least.from __f…

MyPy gives error Missing return statement even when all cases are tested

I am getting a MyPy error "Missing return statement", even when I check for all possible cases inside a function.For example, in the following code, MyPy is still giving me an error "9: …

Python Json with returns AttributeError: __enter__

Why does this return AttributeError: __enter__Sorting method is just a string created based on how the list is sorted, and current time uses stfttimecurrent_time = strftime("%Y-%m-%d %H-%M-%S"…

Workflow for adding new columns from Pandas to SQLite tables

SetupTwo tables: schools and students. The index (or keys) in SQLite will be id and time for the students table and school and time for the schools table. My dataset is about something different, but I…

What is the return type of the find_all method in Beautiful Soup?

from bs4 import BeautifulSoup, SoupStrainer from urllib.request import urlopen import pandas as pd import numpy as np import re import csv import ssl import json from googlesearch import search from…

All addresses to go to a single page (catch-all route to a single view) in Python Pyramid

I am trying to alter the Pyramid hello world example so that any request to the Pyramid server serves the same page. i.e. all routes point to the same view. This is what iv got so far: from wsgiref.sim…